The guiding hypothesis of much of our research is that sequence and structural information combined with biophysical analysis can reveal the fundamental physical principles that underlie a wide range of biological phenomena. Our work includes theoretical research, biophysical measurements, the development of software tools, and specific applications to problems of biological importance. In the past few years, we have elucidated the structural and energetic origins of protein-protein, protein–nucleic acid, and protein-membrane interactions; developed methods for protein structure prediction; and detected novel structural and functional relationships between proteins based on their geometric similarity. We are currently focusing on three distinct areas: the molecular basis of cadherin-mediated cell-cell adhesion, the role of sequence-dependent DNA shape on protein-DNA recognition, and the development of novel structure-based software tools for protein function annotation.
Remote Structural and Functional Relationships Between Proteins
The ability to predict protein structure from primary amino acid sequence is a computational challenge that has assumed increased importance with the advent of structural genomics initiatives around the world. A stated goal of these initiatives is to use experimental techniques to solve enough protein structures so that computational methods can be used to predict most others. Advances in protein structure prediction will require the integration of sequence and structural alignment tools with the ability to evaluate the relative conformational energies of models derived from those alignments. Our lab has an active research program in each of these areas.
In many cases, protein structures can be assembled from substructures of proteins that are classified as belonging to different folds. Implicit in classification schemes is a hierarchical view whereby "structure space" is divided into isolated, non-overlapping "islands" that are denoted by categories such as folds. Our results suggest, however, that protein structure space is continuous—there are meaningful geometric relationships between proteins that are classified very differently. We have developed an approach that can detect such relationships, and we are using these relationships to assign function. The basic idea is to use structural alignments to identify a set of proteins that share a region of geometric similarity to a protein of interest; we then look for the conservation of specific residues or functional motifs that have been mapped onto the sequence in the aligned region. We are currently developing a user-friendly computational tool that will allow us to identify structural and functional relationships between proteins that were previously believed to be unrelated.
Nuance in the Double Helix–Protein Recognition of Minor Groove Shape
The recognition of specific DNA sequences by proteins has generally been thought to depend on two types of mechanisms: one that involves the formation of hydrogen bonds with specific bases, primarily in the major groove (direct readout), and one involving sequence-dependent deformations of the DNA helix (indirect readout). By comprehensively analyzing the three-dimensional structures of protein-DNA complexes, we have shown that the readout of minor groove shape by arginines is a third, widely used mechanism for protein-DNA recognition. We had previously identified this mechanism in the context of our study of Hox proteins, but we now have strong evidence as to its generality. Minor groove narrowing is often associated with the presence of A-tracts, defined as stretches of three or more As and Ts (with the exception of the flexible TpA step). A-tracts tend to narrow the minor groove, which provides a link between sequence and shape.
The biophysical basis of minor groove recognition involves an enhancement of the negative potential of DNA through electrostatic focusing. Electrostatic potential is strongly correlated with minor groove width, and we have found that arginines are frequently located at local minima in width and potential. The nucleosome core particle is a striking example of this phenomenon. Our findings suggest that the recognition of local variations in DNA shape is a general mechanism by which proteins bind specific DNA sequences.
The lack of distinguishing moieties available for forming base-specific hydrogen bonds in the minor groove has been the source of a long-standing enigma of how "information" in the minor groove contributes to sequence-specific binding. Our description of a mechanism that depends on DNA shape and electrostatic potential provides a novel and elegant answer to this question. We are now further characterizing the biological role of minor groove recognition, and we are developing new computational tools to use the phenomenon as basis for predicting transcription factor–binding sites and the presence of nucleosome-positioning signals.
Relating Molecular Binding Affinities to Cell-Cell Adhesive Specificity
"Classical" cadherins are major determinants of intercellular adhesive specificity in multicellular organisms. Classical cadherins have five extracellular domains (EC1–EC5) and mediate adhesion through the dimerization of protomers presented from apposing cells. The identity of the partnering cadherins determines cell-cell specificity; for example, during embryonic tissue development, cells in the neural tube that express N-cadherin separate from ectoderm cells that express E-cadherin. This phenomenon is reflected in vitro in cell assays that show that cells transfected with N- and E-cadherin sort from one another into separate aggregates. However, the sequences of the EC1 domains of N and E-cadherins are ~60 percent identical (and even higher in the dimerization interface). We are interested in how such seemingly trivial differences at the sequence and structural levels mediate highly specific cell adhesion behavior.
Our research program on cell adhesion is carried out in close collaboration with the lab of Lawrence Shapiro (Columbia University), which has shown that cadherins dimerize through the swapping of N-terminal β-strands between the membrane-distal EC1 domains. To relate cell-cell adhesive specificity with molecular properties, we are carrying out binding affinity measurements on different cadherins. Surprisingly, the dimerization affinities for EC1-EC2 constructs of N- and E-cadherin differ by almost an order of magnitude. Moreover, the affinity for the formation of N and E heterodimers is intermediate between the two homodimer affinities.These results suggest that cadherins are not designed to have weak heterophilic interactions, as had been generally assumed, but rather that relatively large differences in the adhesive strengths of homophilic interactions play an important role in determining cell aggregation behavior. To elucidate how cadherin specificity is coded on their molecular structures, we are carrying out computational analyses of the factors that determine binding affinities. We are also using statistical mechanical theories of liquid mixtures to relate the adhesive strengths of interacting molecules to cell aggregation behavior.