Computational Studies of the Structure and Function of Biological Macromolecules
Summary: Barry Honig's research involves the use of computational and biophysical approaches to study the structure and function of biological macromolecules.
The guiding hypothesis of much of our research is that sequence and structural information combined with biophysical analysis can reveal the fundamental physical principles that underlie a wide range of biological phenomena. Our work includes theoretical research, biophysical measurements, the development of software tools, and specific applications to problems of biological importance. In the past few years, we have elucidated the structural and energetic origins of protein-protein, protein–nucleic acid, and protein-membrane interactions; developed methods for protein structure prediction; and detected novel structural and functional relationships between proteins based on their geometric similarity. We are currently focusing on three distinct areas: the large-scale use of structural information to predict protein function, the role of sequence-dependent DNA shape on protein-DNA recognition, and the molecular basis of cadherin-mediated cell-cell adhesion.
Integrating Structural and Systems Biology
The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms. Much of our current knowledge derives from high-throughput experimental techniques as well as from manual curation of experiments on individual systems. Three-dimensional structural information has had only limited impact on this problem, in part because the number of protein sequences is vastly greater than the number of available structures. We have developed a new method, PREPPI, that uses three-dimensional structural information to predict protein-protein interactions with an accuracy and coverage that compare favorably to high-throughput experiments. PREPPI uses structural information on an unprecedented scale, an accomplishment that has been made possible through the extensive use of homology modeling combined with the exploitation of remote structural relationships. A key element in PREPPI design is a novel method that can produce structural representations (interaction models) of literally billions of putative protein-protein complexes that can then be scored in an ultrafast way. We believe that PREPPI is an important step in the widespread use of computational and structural tools to identify previously undetected protein-protein interactions and in the integration of structural and systems biology.
We are currently enhancing PREPPI with algorithmic refinements, incorporating new sources of data, and extending its range of applicability to interactions between structured domains and unstructured peptides. We are also applying it to problems such as the identification and characterization of direct protein-protein interactions in host-virus interactions and cancer signaling pathways.
The recognition of specific DNA sequences by proteins has generally been thought to depend on two types of mechanisms: one that involves the formation of hydrogen bonds with specific bases, primarily in the major groove (direct readout), and one involving sequence-dependent deformations of the DNA helix (indirect readout). By comprehensively analyzing the three-dimensional structures of protein-DNA complexes, we have shown that the readout of minor groove shape by arginines is a third, widely used mechanism for protein-DNA recognition. Minor groove narrowing is often associated with the presence of A-tracts, defined as stretches of three or more As and Ts (with the exception of the flexible TpA step). The biophysical basis of minor groove recognition involves an enhancement of the negative potential of DNA through electrostatic focusing. The nucleosome core particle offers a striking example of this phenomenon. Our findings suggest that the recognition of local variations in DNA shape is a general mechanism by which proteins bind specific DNA sequences.
The importance of the minor groove recognition mechanism that we identified was confirmed in a high-throughput analysis of the binding of Hox proteins to DNA. Hox proteins are transcription factors that determine the anterior-posterior axis in developing embryos; differences in the DNA-binding specificities of these proteins are crucial determinants of cell fate. Using SELEX-seq experiments, we identified DNA sequences specific to different Hox proteins and found that this specificity is revealed only in the presence of a second DNA-binding protein, Exd. Using novel structure prediction tools, we analyzed the sequences identified in our experiments and found that there are distinct patterns of minor groove width for the eight Hox proteins that correlate with their functional domains along the anterior-posterior axis. This remarkable finding establishes the importance of minor-groove topography as a crucial factor in protein-DNA recognition and, more generally, shows how subtle differences in DNA shape can be exploited by closely related members of a single protein family to achieve sequence-specific recognition of their binding sites.
To identify other specificity determinants, we are currently carrying out crystallographic studies of ternary complexes formed by Hox proteins, Exd, and DNA. In parallel we are extending our computational/experimental studies to other transcription factor families in an attempt to identify new rules for protein-DNA recognition.
Relating Molecular Binding Affinities to Cell-Cell Adhesive Specificity
"Classical" cadherins are major determinants of intercellular adhesive specificity in multicellular organisms. Classical cadherins have five extracellular domains (EC1–EC5) and mediate adhesion through the dimerization of protomers presented from apposing cells. The identity of the partnering cadherins determines cell-cell specificity; for example, during embryonic tissue development, cells in the neural tube that express N-cadherin separate from ectoderm cells that express E-cadherin. This phenomenon is reflected in vitro in cell assays that show that cells transfected with N- and E-cadherin sort from one another into separate aggregates. However, the sequences of the EC1 domains of N- and E-cadherins are ~60 percent identical (and even higher in the dimerization interface). We are interested in how such seemingly trivial differences at the sequence and structural levels mediate highly specific cell-sorting behavior and, more generally, we would like to determine the relationship between the molecular properties of adhesion molecules and cellular phenomena.
Our research program on cell adhesion is carried out in close collaboration with the lab of Lawrence Shapiro (Columbia University). Our work involves protein crystallography, biophysical measurements, in vitro cell assays, and in vivo studies, all combined with theoretical and computational simulations at multiple levels of granularity. Most recently we have focused the lateral (cis) clustering of cadherin ectodomains following trans binding to cadherins on apposing cells. Cadherins are known to form ordered structures following trans binding but the mechanism was completely obscure. In a series of papers we showed that cadherins form a two-dimensional crystal-like lattice in the interface between two cells and that this process is driven by the coupling between trans and cis interactions. Specifically, when cadherins form trans interactions their flexibility is dramatically reduced, thus reducing the entropic penalty associated with the formation of inherently weaker cis interactions. The discovery of this mechanism was made possible by a new theory we developed that showed how to transform binding affinities measured in solution (three-dimensional) to two-dimensional affinities that are relevant to the constrained environment of a membrane surface. When we combined this theory with novel multiscale molecular simulations, we were able to simulate the process of junction formation, starting only with structural information and measured solution-binding affinities. We believe that the methods we developed and the underlying concepts will be relevant to many other adhesion systems. We are now carrying out experimental tests of our models on cadherins while using computational tools to explore other adhesion receptors.
Portions of this work are also supported by the National Institutes of Health and the National Science Foundation.
As of June 18, 2012