Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis

dc.contributor.advisorSolka, Jeffrey L.
dc.contributor.authorMillis, David Howard
dc.creatorMillis, David Howard
dc.date.accessioned2014-09-18T01:56:11Z
dc.date.available2014-09-18T01:56:11Z
dc.date.issued2014-05
dc.description.abstractGene prioritization is the process of ranking a list of candidate genes such that the genes that are most likely involved in a biological process of interest receive the highest rankings. In a supervised learning approach to gene prioritization, candidate genes are ranked in terms of their degree of similarity to genes that have already been shown to be involved in the process of interest. Gene prioritization thus can be cast as a classification task, in which a training set of genes and data associated with those genes is used to train a classifier to assign rankings to unknown genes, based on their degree of similarity to the training genes. This thesis describes the use of kernel methods, and particularly a method known as multiple kernel learning, for combining information from multiple data sources for purposes of gene prioritization. Multiple kernel learning facilitates the incorporation of heterogeneous data types into the assessment of similarity among genes. In addition, the rows of the kernel matrix can be repurposed as feature vectors. We apply clustering methods to these vectors to partition the gene list into related groups. We then perform functional enrichment analysis on the gene clusters to identify biological functions that are significantly represented in each gene cluster. We thus are able to use a single data structure, namely a kernel matrix representing similarities among genes based on multiple information sources, as the basis for three common types of bioinformatics analysis: gene prioritization, gene clustering, and functional annotation analysis of gene lists. This research contributes to the exploration of methods for extracting useful biological insights from the continually expanding knowledge base of biological data.
dc.format.extent128 pages
dc.identifier.urihttps://hdl.handle.net/1920/8892
dc.language.isoen
dc.rightsCopyright 2014 David Howard Millis
dc.subjectBioinformatics
dc.subjectBioinformatics
dc.subjectFunctional enrichment
dc.subjectGene clustering
dc.subjectGene prioritization
dc.subjectMultiple kernel learning
dc.subjectSupport vector machines
dc.titleMultiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis
dc.typeDissertation
thesis.degree.disciplineBioinformatics and Computational Biology
thesis.degree.grantorGeorge Mason University
thesis.degree.levelDoctoral

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Millis_gmu_0883E_10604.pdf
Size:
920.79 KB
Format:
Adobe Portable Document Format