Summarization, Visualization, and Mining of Molecular Landscapes



Journal Title

Journal ISSN

Volume Title



The work in this dissertation focuses on summarization, visualization, and recognition problems that arise in high-dimensional spatial data. Specifically, the data of interest to us are those that arise in computational biology problems aiming to relate molecular three-dimensional structure to molecular function. Molecular structure has long been recognized as a carrier and regulator of biological function. Technological advances have revealed that intrinsically-dynamic molecules, such as proteins, populate a vast structure space, assuming different structures to modulate interactions with different partners in the cell. In particular, it is now commonplace to obtain hundreds of thousands of three-dimensional structures of a biological molecule, leading to needle-in-a-haystack problems that have natural formulations under a machine learning framework. The work in this dissertation aims to advance research on visualization, summarization, and recognition problems on molecular structure spaces by exploiting the molecular energy landscape associated with a structure space. Specifically, graph-based representations and graph mining of such landscapes have been leveraged to summarize the molecular energy landscape. The work presented in this thesis shows that mining of the energy landscape corresponding to molecular structure space promises to advance our ability to recognize biologically-active structures/needles in a vast space/haystack via supervised and unsupervised learning. While the focus of this work is on molecular structure data, the approaches we presented here are of general utility beyond computational biology to any domain where the goal is to reveal informative organizations of high-dimensional, spatial data that support prediction.