Latent Variable Models of Sequence Data for Classification and Discovery

dc.contributor.advisorRangwala, Huzefa
dc.contributor.authorBlasiak, Samuel J.
dc.creatorBlasiak, Samuel J.
dc.date.accessioned2014-08-28T03:17:38Z
dc.date.available2014-08-28T03:17:38Z
dc.date.issued2013-08
dc.description.abstractThe need to operate on sequence data is prevalent across a range of real world applications including protein/DNA classification, speech recognition, intrusion detection, and text classification. Sequence data can be distinguished from the more-typical vector representation in that the length of sequences within a dataset can vary and that the order of symbols within a sequence carries meaning. Although it has become increasingly easy to collect large amounts of sequence data, our ability to infer useful information from these sequences has not kept pace. For instance, in the domain of biological sequences, experimentally determining the order of amino acids in a protein is far easier than determining the protein's physical structure or its role within a living organism. This asymmetry holds over a number of sequence data domains, and, as a result, researchers increasingly rely on computational techniques to infer properties of sequences that are either difficult or costly to collect through direct measurement. The methods I describe in this dissertation attempt to mitigate this asymmetry by advancing state-of-the-art techniques for extracting useful information from sequence data.
dc.format.extent210 pages
dc.identifier.urihttps://hdl.handle.net/1920/8798
dc.language.isoen
dc.rightsCopyright 2013 Samuel J. Blasiak
dc.subjectComputer science
dc.subjectHidden Markov Model
dc.subjectLatent Variable Model
dc.subjectNeural Network
dc.subjectSequences
dc.subjectSparse Dictionary Learning
dc.subjectTopic Model
dc.titleLatent Variable Models of Sequence Data for Classification and Discovery
dc.typeDissertation
thesis.degree.disciplineComputer Science
thesis.degree.grantorGeorge Mason University
thesis.degree.levelDoctoral

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Blasiak_gmu_0883E_10448.pdf
Size:
2.25 MB
Format:
Adobe Portable Document Format