Automatic Functional Annotation of Prokaryotes

dc.contributor.advisorVaisman, Iosif I.
dc.contributor.authorGoetz, Philip
dc.creatorGoetz, Philip
dc.date2011-12-06
dc.date.accessioned2012-02-01T20:45:58Z
dc.date.availableNO_RESTRICTION
dc.date.available2012-02-01T20:45:58Z
dc.date.issued2012-02-01
dc.description.abstractOne key method of automatic functional annotation of a prokaryote gene is finding BLAST hits to the gene in question that have functional annotations, choosing the best single hit, and copying the annotation from that hit if it is of sufficient quality as measured by a p-value or other criterion. In the JCVI prokaryote automatic functional annotation system, the best hit is chosen by looking up categories in a manually constructed table stating how reliable the annotation is depending on who made the annotation, what percent identity the BLAST hit had, and what percentage of the query gene and the hit gene were involved in the match. Constructing this table is labor-intensive; and humans are incapable of processing enough data to construct it correctly. I therefore reduced the data requirements by breaking the table into orthogonal components; and I developed an iterative method to minimize the least-squares error of the table on a training set. I also constructed a validation set of 50,000 manually-annotated proteins from JCVI data, and developed a protein name thesaurus and ontology to make it possible to tell when two names meant the same thing, or when one name was a more-specific refinement of another name. Training on 9/10 of the validation set, and testing on the held-out 1/10, showed an improvement in accuracy from 61.5% to 72.4%.
dc.identifier.urihttps://hdl.handle.net/1920/7500
dc.language.isoen_US
dc.subjectProtien Function
dc.subjectFunctional Annotation
dc.subjectGenome
dc.titleAutomatic Functional Annotation of Prokaryotes
dc.typeThesis
thesis.degree.disciplineBioinformatics
thesis.degree.grantorGeorge Mason University
thesis.degree.levelMaster's
thesis.degree.nameMasters in Bioinformatics

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Goetz_thesis_2011.pdf
Size:
382.89 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.65 KB
Format:
Item-specific license agreed upon to submission
Description: