Unsupervised Learning for Molecular Structure Discoveries



Journal Title

Journal ISSN

Volume Title



We have long known that form determines function. This is particularly true of biological molecules, which utilize their three-dimensional structures to interface with one another and propagate chemical reactions in the living cell. We also now better understand how vast and rich the structure space available to a molecule is and how little we know about what information to extract from this space to better characterize the structure(s)-function(s) relationship in biological molecules. This dissertation puts forth computational concepts and techniques to support this goal. Particularly, we develop algorithms to organize the structure space of a molecule and reveal one or more important structural states of small molecules, macromolecules, and complexated molecules. The algorithms proposed here fall under the umbrella of unsupervised learning but leverage explicit or implicit embeddings of molecular structures in discrete data-structures, such as graphs, to better utilize proximity in structure space for capturing structural states. The proposed algorithms employ diverse formalizations and show the power of those formalizations in addressing increasingly complex problems and application settings. Rigorous evaluation on hallmark problems in computational structural biology suggests that the leveraged formalizations and proposed algorithms advance research on unsupervised learning of the organization of molecular structure spaces.



Clustering Algorithms, Matrix Factorization, Molecular Dynamics, Protein Structure, Tensor Factorization, Unsupervised Learning