Mason Archival Repository Service

Classification of Thermophilic and Mesophilic Proteins Using N-Grams

Show simple item record

dc.contributor.advisor Vaismann, Iosif
dc.contributor.author Elattar, Marwy
dc.creator Elattar, Marwy
dc.date 2016-11-27
dc.date.accessioned 2017-12-07T21:15:29Z
dc.date.available 2017-12-07T21:15:29Z
dc.identifier doi:10.13021/G8DM5F
dc.identifier.uri https://hdl.handle.net/1920/10794
dc.description.abstract The project is focused on machine learning classification of thermophilic and mesophilic proteins using N-gram based representation of protein sequences. Two datasets containing proteins from both classes were used for the analysis. Alphabet reduction was performed on all datasets, and n-gram frequencies were calculated for each sequence using the reduced alphabet. Data normalization was done by calculating n-gram likelihoods. Four different machine learning algorithms (Naïve Bayes, Support Vector Machines, Decision Trees and Random Forests) were used for the protein classification. Accuracies of 100.0% were achieved using SVM, 99.3% using Random Forests, 90.3% using Naïve Bayes and 99.6 using Decision Trees.
dc.language.iso en en_US
dc.subject thermophilic en_US
dc.subject mesophilic en_US
dc.subject n-grams en_US
dc.subject machine learning en_US
dc.title Classification of Thermophilic and Mesophilic Proteins Using N-Grams en_US
dc.type Thesis en_US
thesis.degree.name Master of Science in Bioinformatics and Computational Biology en_US
thesis.degree.level Master's en_US
thesis.degree.discipline Bioinformatics and Computational Biology en_US
thesis.degree.grantor George Mason University en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search MARS


Browse

My Account

Statistics