Evolutionary Techniques for De Novo Protein Conformation Ensemble Generation
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The conformations in which a protein molecule arranges its amino-acid chain are primary determinants of its ability to interact with other molecules in the living cell. Discovering the functionally-relevant conformations of a protein is crucial to elucidating its functional repertoire and even further our understanding of diseases driven by mutations that affect the ability of a protein to assume specific conformations. Discovering the possibly diverse set of biologically-relevant conformations of a protein from knowledge of its amino-acid sequence alone remains an outstanding challenge. While progress has been made in this direction, most notably by AlphaFold2, in discovering what is often referred to as a native structure, methods based on machine learning, including deep learning, are limited in their ability to see the entirety of the conformation space of a protein. While obtaining one conformation is sufficient for some proteins, many others are involved in many cellular reactions in the cell, and harness their ability to assume different conformations to achieve functional plasticity. Nowhere is the ability of proteins to assume different conformations more visible in the public eye than nowadays; we are all familiar with images that show the spike protein in the SARS-CoV2 virus switching between an open and a closed conformation to elude our immune system and strike at just the right moment by binding with the ACE receptors in its closed conformation. This dissertation presents a way forward on exploring the conformation space of a given protein molecule, when the only information available is the sequence of amino acids. We refer to this problem as de-novo protein conformation ensemble generation. In this dissertation, we present several novel stochastic optimization algorithms that operate under the umbrella of evolutionary computation. We show that these algorithms are able to balance between the known challenges of exploration and exploitation and capture meaningful representations of the conformation space, despite its size and complexity. In particular, we show that they are able to capture the presence of significantly different functionally-relevant conformations in metamorphic proteins, which we also provide to the community as a benchmark to further advance research on this problem. The work presented in this dissertation represents important groundwork for researchers aiming to improve protein conformation sampling in order to better understand the structural and functional plasticity of protein molecules in all their exquisite complexity.