Computational Geometry Approach to the Analysis of Organism-Dependent Features in Protein Structures

Date

2010-02-16T20:14:08Z

Authors

Luo, Yong

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Recent research on organism-dependent features of proteins has been mostly focused on the analysis of their primary sequences, while studies of their structural differences are rare. The general organism-dependent structural features in proteins are obscured by the strong sequence and structural similarities between the homologous proteins across different genomes. In this work we implemented protein structure descriptors based on Delaunay tessellation of the structures. Delaunay tessellation identifies the quadruplets of nearest neighbor residues, in which we enumerated all possible residue compositions using full 20 letter alphabet as well as a number of reduced alphabets. Feature vectors based on these descriptors were generated to represent organism-dependent features of an individual protein structure. Protein and domain structures of a series of organisms were collected. We applied supervised machine learning techniques to develop classifiers for proteins from different organisms. This result strongly indicates the presence of organism-dependent signals in protein structure. The discrimination capability of machine learning models is strongly dependent on the reduced residue alphabet used in the modeling. Comparison of the model performance with different amino acid residue alphabet reduction schemes and organism pairs provides novel insights into the evolution of protein structure.

Description

Keywords

Protein structure, Computational geometry, Residue alphabet reduction, Delaunay tessellation, Organism-dependent features, Machine learning

Citation