Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis

Millis, David Howard

Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis

dc.contributor.advisor	Solka, Jeffrey L.
dc.contributor.author	Millis, David Howard
dc.creator	Millis, David Howard
dc.date.accessioned	2014-09-18T01:56:11Z
dc.date.available	2014-09-18T01:56:11Z
dc.date.issued	2014-05
dc.description.abstract	Gene prioritization is the process of ranking a list of candidate genes such that the genes that are most likely involved in a biological process of interest receive the highest rankings. In a supervised learning approach to gene prioritization, candidate genes are ranked in terms of their degree of similarity to genes that have already been shown to be involved in the process of interest. Gene prioritization thus can be cast as a classification task, in which a training set of genes and data associated with those genes is used to train a classifier to assign rankings to unknown genes, based on their degree of similarity to the training genes. This thesis describes the use of kernel methods, and particularly a method known as multiple kernel learning, for combining information from multiple data sources for purposes of gene prioritization. Multiple kernel learning facilitates the incorporation of heterogeneous data types into the assessment of similarity among genes. In addition, the rows of the kernel matrix can be repurposed as feature vectors. We apply clustering methods to these vectors to partition the gene list into related groups. We then perform functional enrichment analysis on the gene clusters to identify biological functions that are significantly represented in each gene cluster. We thus are able to use a single data structure, namely a kernel matrix representing similarities among genes based on multiple information sources, as the basis for three common types of bioinformatics analysis: gene prioritization, gene clustering, and functional annotation analysis of gene lists. This research contributes to the exploration of methods for extracting useful biological insights from the continually expanding knowledge base of biological data.
dc.format.extent	128 pages
dc.identifier.uri	https://hdl.handle.net/1920/8892
dc.language.iso	en
dc.rights	Copyright 2014 David Howard Millis
dc.subject	Bioinformatics
dc.subject	Bioinformatics
dc.subject	Functional enrichment
dc.subject	Gene clustering
dc.subject	Gene prioritization
dc.subject	Multiple kernel learning
dc.subject	Support vector machines
dc.title	Multiple Kernel Learning for Gene Prioritization, Clustering, and Functional Enrichment Analysis
dc.type	Dissertation
thesis.degree.discipline	Bioinformatics and Computational Biology
thesis.degree.grantor	George Mason University
thesis.degree.level	Doctoral

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Millis_gmu_0883E_10604.pdf
Size:: 920.79 KB
Format:: Adobe Portable Document Format

Download

Collections

College of Science