Latent Variable Models of Sequence Data for Classification and Discovery

Blasiak, Samuel J.

Latent Variable Models of Sequence Data for Classification and Discovery

Files

Blasiak_gmu_0883E_10448.pdf (2.25 MB)

Date

2013-08

Authors

Blasiak, Samuel J.

Abstract

The need to operate on sequence data is prevalent across a range of real world applications including protein/DNA classification, speech recognition, intrusion detection, and text classification. Sequence data can be distinguished from the more-typical vector representation in that the length of sequences within a dataset can vary and that the order of symbols within a sequence carries meaning. Although it has become increasingly easy to collect large amounts of sequence data, our ability to infer useful information from these sequences has not kept pace. For instance, in the domain of biological sequences, experimentally determining the order of amino acids in a protein is far easier than determining the protein's physical structure or its role within a living organism. This asymmetry holds over a number of sequence data domains, and, as a result, researchers increasingly rely on computational techniques to infer properties of sequences that are either difficult or costly to collect through direct measurement. The methods I describe in this dissertation attempt to mitigate this asymmetry by advancing state-of-the-art techniques for extracting useful information from sequence data.

Keywords

Computer science, Hidden Markov Model, Latent Variable Model, Neural Network, Sequences, Sparse Dictionary Learning, Topic Model

URI

https://hdl.handle.net/1920/8798

Collections

College of Engineering and Computing

Full item page

Latent Variable Models of Sequence Data for Classification and Discovery

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections