Abstract:
Computational methods able to assist or complement wet-laboratory experiments in
structural characterization of molecular assemblies promise to provide detailed insight into
molecular interactions, drug-design, and biological function in the living and diseased cell.
Methods that predict three-dimensional structures of protein-protein assemblies are abundant in computational structural biology. However, challenges remain in accurately detecting the interacting interface between participating units in an assembly. For search
algorithms, the task of predicting the biologically-active structure of an assembly poses
particular challenges due to the high dimensionality of the search space where potentially
relevant assembly configurations lie.
The work presented in this thesis is a step towards developing a new set of computational techniques and algorithms for structural characterization of protein-protein assemblies. Specifically, the work here focuses on modeling the three-dimensional quaternary
structure of a protein dimer, a complex formed by interactions between two participating
protein chains. This problem is commonly known as protein-protein docking. This work
addresses the problem of rigid protein-protein docking, where the given unbounded structures of the protein units about to dimerize are expected to be the same as the bounded
ones after dimerization. In addition to techniques proposed to alleviate certain computational aspects related
with finding the right docking interface in protein dimers, this thesis proposes a new probabilistic search algorithm that employs both geometry and energy to sample low-energy
configurations of a protein dimer. Analysis of evolutionary conservation and a geometric
treatment of the molecular surface are combined in order to identify potentially-relevant contact interfaces between the two units in the dimer. Docking is focused only on evolutionary-
conserved geometrically-complementary regions between the units' molecular surfaces, resulting in a narrower search space of rigid-body motions matching only such regions. This
treatment is the first contribution of this work. The second contribution is a probabilistic
search algorithm that efficiently explores the space of rigid-body motions corresponding to
local minima in an energy function capturing interactions in a dimeric configuration.
The proposed algorithm is an adaptation of the Basin Hopping (BH) framework. The
work presented in this thesis details implementation and careful analysis of the components
that result in an effective BH algorithm for rigid protein-protein docking. Application on
a diverse list of protein shows that the algorithm is able to recover the native dimeric configuration as well as produce other relevant minima near the native configuration of a
given dimer. A detailed analysis is presented that shows the algorithm reproduces known
properties of the BH framework in other contexts and application, most notably the relationship between adjacency between consecutively-obtained local minima and proximity to
the known native dimeric configuration. Taken together, the results presented show that
the algorithm can be employed as a first stage in a computational docking protocol to
sample low-energy near-native dimeric configurations that can then be further refined and
discriminated with more computationally-intensive optimization protocols.