Classification of Thermophilic and Mesophilic Proteins Using N-Grams

dc.contributor.advisorVaismann, Iosif
dc.contributor.authorElattar, Marwy
dc.creatorElattar, Marwy
dc.date2016-11-27
dc.date.accessioned2017-12-07T21:15:29Z
dc.date.available2017-12-07T21:15:29Z
dc.description.abstractThe project is focused on machine learning classification of thermophilic and mesophilic proteins using N-gram based representation of protein sequences. Two datasets containing proteins from both classes were used for the analysis. Alphabet reduction was performed on all datasets, and n-gram frequencies were calculated for each sequence using the reduced alphabet. Data normalization was done by calculating n-gram likelihoods. Four different machine learning algorithms (Naïve Bayes, Support Vector Machines, Decision Trees and Random Forests) were used for the protein classification. Accuracies of 100.0% were achieved using SVM, 99.3% using Random Forests, 90.3% using Naïve Bayes and 99.6 using Decision Trees.
dc.identifierdoi:10.13021/G8DM5F
dc.identifier.urihttps://hdl.handle.net/1920/10794
dc.language.isoen
dc.subjectThermophilic
dc.subjectMesophilic
dc.subjectN-grams
dc.subjectMachine learning
dc.titleClassification of Thermophilic and Mesophilic Proteins Using N-Grams
dc.typeThesis
thesis.degree.disciplineBioinformatics and Computational Biology
thesis.degree.grantorGeorge Mason University
thesis.degree.levelMaster's
thesis.degree.nameMaster of Science in Bioinformatics and Computational Biology

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Elattar_thesis_2016.pdf
Size:
1.33 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.52 KB
Format:
Item-specific license agreed upon to submission
Description: