Mason Archival Repository Service

Classification of Thermophilic and Mesophilic Proteins Using N-Grams

Show simple item record

dc.contributor.advisor Vaismann, Iosif Elattar, Marwy
dc.creator Elattar, Marwy 2016-11-27 2017-12-07T21:15:29Z 2017-12-07T21:15:29Z
dc.identifier doi:10.13021/G8DM5F
dc.description.abstract The project is focused on machine learning classification of thermophilic and mesophilic proteins using N-gram based representation of protein sequences. Two datasets containing proteins from both classes were used for the analysis. Alphabet reduction was performed on all datasets, and n-gram frequencies were calculated for each sequence using the reduced alphabet. Data normalization was done by calculating n-gram likelihoods. Four different machine learning algorithms (Naïve Bayes, Support Vector Machines, Decision Trees and Random Forests) were used for the protein classification. Accuracies of 100.0% were achieved using SVM, 99.3% using Random Forests, 90.3% using Naïve Bayes and 99.6 using Decision Trees.
dc.language.iso en en_US
dc.subject thermophilic en_US
dc.subject mesophilic en_US
dc.subject n-grams en_US
dc.subject machine learning en_US
dc.title Classification of Thermophilic and Mesophilic Proteins Using N-Grams en_US
dc.type Thesis en_US Master of Science in Bioinformatics and Computational Biology en_US Master's en_US Bioinformatics and Computational Biology en_US George Mason University en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


My Account