Computational Protein Design
Summary: Stephen Mayo is interested in the theoretical, computational, and experimental aspects of protein design and folding.
The focus of our lab has been the development and application of computational protein design methods with the overarching aim of elucidating the relationships between amino acid sequence, protein structure, and biological function. In contrast to a traditional “mutagenic” paradigm, we seek to develop quantitative models based on protein physical chemistry that allow for the direct testing of hypotheses that speak to the structure, function, and evolution of biological macromolecules. This “design” paradigm manifests itself first in the development of computational methodologies capable of addressing the combinatorial complexity of protein sequence design, second in the development of potential energy functions tailored specifically for protein design, and finally in the application of the resulting quantitative methods for the exploration of a broad range of biophysical questions, ranging from the thermodynamic basis of protein stability to the role of calmodulin in synaptic plasticity. Our efforts encompass both multidisciplinary and interdisciplinary activities, including applied mathematics and computer science, physical chemistry, experimental protein chemistry, and NMR (nuclear magnetic resonance) and x-ray crystallographic structure determination.
Computational Methods Development
The large-scale combinatorial problem associated with computational protein design has largely been overcome in recent years. In particular, two algorithms have emerged that allow for the exact (or nearly exact) solution of even exceedingly large problems. The HERO (hybrid exact rotamer optimization) algorithm creatively combines dominance criteria based on the dead-end elimination theorem and a stochastic search into a unified framework that results in an overall exact, but nondeterministic, search capable of identifying global minimum energy conformations (GMECs) for large design problems. The HERO algorithm has become our standard tool when GMEC solutions are desired (as in cases related to force field optimization).
More recently, we have changed our focus to the further development of combinatorial optimization algorithms for cases where GMEC solutions are not necessarily required (most large design cases). We have extended the FASTER (fast and accurate side-chain topology and energy refinement) algorithm, first developed by Johan Desmet (AlgoNomics, Belgium) and coworkers for side-chain placement calculations, to allow for amino acid sequence design. The most significant extensions include parallel initializations for compatibility with both the combinatorial demands of sequence design and the typical multiprocessor hardware setups used for large calculations, replacement of the initial phases of the calculation with a fast Monte Carlo search that provides vastly superior initial rotamer configurations, and implementation of a “zone” optimization procedure in the final refinement steps that significantly accelerates the calculation without compromising solution quality. As an example, for a large optimization composed of 10303 possible rotamer solutions, these extensions result in a performance improvement of as much as 8-fold (11.5 hours vs. 1.5 hours) over the published version of the FASTER algorithm. In this case, the resulting solution is the GMEC, which was confirmed by HERO. The time required to obtain the GMEC using HERO, however, is 59 times longer (~4 days) than for our modified version of FASTER.
Continuum Electrostatic Solvation for Protein Design
Protein design is an exceptionally difficult problem characterized by unique complications. Necessary restrictions such as a fixed protein backbone and discrete side-chain conformations (rotamers) require different considerations of structure-energy relationships than other fields of protein simulation. This structure-energy relationship has been a long-standing focus of our research, which strives to address issues including the identity of the forces that lead to protein stability and the relative strengths of these forces. Until now, damped coulombic potentials as well as empirical surface area and volume-scaling functions have been used to include electrostatic solvation energy in computational protein design calculations. These methods have allowed for the successful design of stable proteins but have been a limiting factor in the rational design of enzymatic activity and molecular recognition, for which polar and charged amino acids are key. To bring protein design energy functions up to date with these challenges, we are investigating more sophisticated continuum models for electrostatic solvation. Two related obstacles to improving electrostatic solvation energy functions are the combinatorial explosion in protein design, which requires energy scores for many side chains and pairs of side chains and therefore a very fast energy solver, and the need to calculate energies in one-body (single side chain) and two-body (pairs of side chains) terms without any knowledge of the rest of the structure.
We are first interested in using fast perturbation methods for two-body terms, allowing for the computationally lengthy numerical solution to the Poisson-Boltzmann equation for a large number of side-chain pairs. We are also testing the speed and accuracy of various analytical generalized Born methods. Coupled with strategies for approximating a molecular surface during the design calculation, both approaches allow us to more accurately describe the energy of a protein's charge distribution in the context of its molecular geometry and surrounding solvent. Such improvements in the electrostatic solvation energy model for protein design will have a significant impact in the areas of enzyme design and molecular recognition.
A prominent goal of protein design is the generation of proteins with novel functions, including the catalytic rate enhancement of chemical reactions at which natural enzymes are so efficient. The ability to design an enzyme to perform a given chemical reaction has considerable practical application for industry and medicine. Significant progress has been made at enhancing the catalytic properties of existing enzymes; however, the design of proteins with novel catalytic properties has met with relatively limited success. We have developed and implemented a general computational approach for the design of enzyme-like proteins with novel catalytic activities. In addition to the generation of new catalysts, these methods will allow the exploration of the mechanistic basis of enzymatic activity.
Recently we have been interested in creating a completely novel catalyst for the Claisen rearrangement of chorismate to prephenate. Naturally catalyzed by the chorismate mutases, this reaction offers many desirable features as an early test of enzyme design methods. The reaction, a first-order sigmatropic rearrangement of a single substrate, has neither intermediate steps nor involvement of catalytic groups such as general acids or bases. The reaction has been extensively studied in many contexts—as a rare enzyme-catalyzed pericyclic process, as an essential step in the biosynthesis of aromatic compounds, and as an example of a reaction that occurs through identical mechanisms enzymatically and in solution. Our method of enzyme design involves identifying amino acid sequences likely to bind to the transition-state structure of the chorismate-prephenate rearrangement. As a part of this process, we are testing the ability of our method to predict mutations that enhance the activity of the naturally occurring Escherichia coli chorismate mutase. The computationally designed Ala32Ser mutation results in an enzyme with measurably enhanced activity.
Biologically functional proteins often carry out their actions by interacting with other components in the cell, and protein-protein association serves a very important role. Proteins can bind directly to their targets to carry out a function or they can bind specifically to themselves, forming higher-order structures to perform their duties. We are interested in learning how proteins utilize their surface residues to interact with other proteins. We are also curious about the influence protein backbone geometry has on complex formation.
Previous efforts in designing protein/protein-binding interfaces have focused on altering binding specificities. Because of difficulties in accurately modeling protein backbones, however, these methods fall short when applied to the design of novel binding sites. Our short-term goal is to create novel dimers from monomeric proteins. We developed a special docking algorithm that positions the member protein subunits in plausible configurations with respect to each other, using parameters determined from the structures of known protein complexes. The docking procedure treats the proteins as rigid bodies and uses the Fourier correlation theorem and the fast Fourier transform to search efficiently for dimers with the highest interfacial surface complementarities. Using the docked structures as scaffolds for protein design and employing hydrophobic surface residues to drive dimer formation, we demonstrated two successful designs, one heterodimer and one homodimer, using protein G and engrailed homeodomain, respectively, as the starting monomeric proteins. Circular dichroism, nuclear magnetic resonance, analytical ultracentrifugation, and x-ray crystallography methods were used to synthesize and characterize the computationally designed dimers. These results suggest that this strategy can be used to address the protein recognition problem and is generally applicable to creating novel binding sites with compatible binding partners.
Design of Calcium-Deficient Calmodulin
Interactions of the Ca2+ sensor protein calmodulin (CaM) with calmodulin-dependent protein kinase II (CaMKII) are central to the Ca2+-signaling pathways implicated in learning and memory. Ca2+ signals of different magnitude and duration are sensed by CaM, which can bind up to four Ca2+ ions. Ca2+ binding to CaM induces a conformational change within the protein that is essential for recognition and activation of many CaM-regulated proteins, including CaMKII. CaMKII activated by Ca2+/CaM phosphorylates a number of downstream protein targets in synapses. The binding of all four Ca2+ ions to CaM is generally believed to be a prerequisite for CaM-induced activation of CaMKII. However, the observed Ca2+ concentrations during the periods of Ca2+ influx into the postsynaptic spine are too low to be consistent with this hypothesis.
To investigate whether CaM can activate CaMKII with only two bound Ca2+ ions, we designed two CaM mutants: one that binds Ca2+ ions only at the C-terminal domain (NMUTCWT), and one that binds Ca2+ only at the N-terminal domain (NWTCMUT). In each CaM mutant, the inactivated domain was designed by stabilizing it in the “closed” Ca2+-free conformation, while the other domain was kept intact. Ionization mass spectrometry confirmed the 2:1 Ca2+/CaM stochiometry for the designed mutants. NMUTCWT could activate CaMKII at the low Ca2+ concentrations believed to occur in the postsynaptic density in spines. Our findings show that differential activation of signaling enzymes by partially saturated CaM may contribute to synaptic plasticity's sensitivity to the timing and magnitude of postsynaptic Ca2+ flux and suggests the need to reevaluate the sensitivity of other postsynaptic signaling enzymes to CaM containing less than four bound Ca2+ ions.
This work was supported by the Ralph M. Parsons Foundation, the Defense Advanced Research Projects Agency (DARPA), an IBM Shared University Research Grant, and the Institute for Collaborative Biotechnologies (ICB).