Classification of Thermophilic and Mesophilic Proteins Using N-Grams
dc.contributor.advisor | Vaismann, Iosif | |
dc.contributor.author | Elattar, Marwy | |
dc.creator | Elattar, Marwy | |
dc.date | 2016-11-27 | |
dc.date.accessioned | 2017-12-07T21:15:29Z | |
dc.date.available | 2017-12-07T21:15:29Z | |
dc.description.abstract | The project is focused on machine learning classification of thermophilic and mesophilic proteins using N-gram based representation of protein sequences. Two datasets containing proteins from both classes were used for the analysis. Alphabet reduction was performed on all datasets, and n-gram frequencies were calculated for each sequence using the reduced alphabet. Data normalization was done by calculating n-gram likelihoods. Four different machine learning algorithms (Naïve Bayes, Support Vector Machines, Decision Trees and Random Forests) were used for the protein classification. Accuracies of 100.0% were achieved using SVM, 99.3% using Random Forests, 90.3% using Naïve Bayes and 99.6 using Decision Trees. | |
dc.identifier | doi:10.13021/G8DM5F | |
dc.identifier.uri | https://hdl.handle.net/1920/10794 | |
dc.language.iso | en | |
dc.subject | Thermophilic | |
dc.subject | Mesophilic | |
dc.subject | N-grams | |
dc.subject | Machine learning | |
dc.title | Classification of Thermophilic and Mesophilic Proteins Using N-Grams | |
dc.type | Thesis | |
thesis.degree.discipline | Bioinformatics and Computational Biology | |
thesis.degree.grantor | George Mason University | |
thesis.degree.level | Master's | |
thesis.degree.name | Master of Science in Bioinformatics and Computational Biology |