dc.contributor.author |
Luo, Yong
|
|
dc.creator |
Luo, Yong |
|
dc.date |
2009-12-11 |
|
dc.date.accessioned |
2010-02-16T20:14:08Z |
|
dc.date.available |
NO_RESTRICTION |
en_US |
dc.date.available |
2010-02-16T20:14:08Z |
|
dc.date.issued |
2010-02-16T20:14:08Z |
|
dc.identifier.uri |
https://hdl.handle.net/1920/5691 |
|
dc.description.abstract |
Recent research on organism-dependent features of proteins has been mostly focused on
the analysis of their primary sequences, while studies of their structural differences are
rare. The general organism-dependent structural features in proteins are obscured by the
strong sequence and structural similarities between the homologous proteins across
different genomes. In this work we implemented protein structure descriptors based on
Delaunay tessellation of the structures. Delaunay tessellation identifies the quadruplets
of nearest neighbor residues, in which we enumerated all possible residue compositions
using full 20 letter alphabet as well as a number of reduced alphabets. Feature vectors
based on these descriptors were generated to represent organism-dependent features of an
individual protein structure. Protein and domain structures of a series of organisms were
collected. We applied supervised machine learning techniques to develop classifiers for
proteins from different organisms. This result strongly indicates the presence of
organism-dependent signals in protein structure. The discrimination capability of
machine learning models is strongly dependent on the reduced residue alphabet used in
the modeling. Comparison of the model performance with different amino acid residue
alphabet reduction schemes and organism pairs provides novel insights into the evolution
of protein structure. |
|
dc.language.iso |
en_US |
en_US |
dc.subject |
protein structure |
en_US |
dc.subject |
computational geometry |
en_US |
dc.subject |
residue alphabet reduction |
en_US |
dc.subject |
Delaunay tessellation |
en_US |
dc.subject |
organism-dependent features |
en_US |
dc.subject |
Machine learning |
en_US |
dc.title |
Computational Geometry Approach to the Analysis of Organism-Dependent Features in Protein Structures |
en_US |
dc.type |
Dissertation |
en |
dc.description.note |
Supporting data in the form of Microsoft Excel documents is included. |
en_US |
thesis.degree.name |
Doctor of Philosophy in Bioinformatics and Computational Biology |
en_US |
thesis.degree.level |
Doctoral |
en |
thesis.degree.discipline |
Bioinformatics and Computational Biology |
en |
thesis.degree.grantor |
George Mason University |
en |