To carry out our prediction and design calculations, we have been developing a computer program called Rosetta. At the core of Rosetta are potential functions for computing the energies of interactions within and between macromolecules, and methods for finding the lowest energy structure for an amino acid sequence (protein-structure prediction) or a protein-protein complex and for finding the lowest energy amino acid sequence for a protein or protein-protein complex (protein design). Feedback from the prediction and design tests is used continually to improve the potential functions and the search algorithms. Development of one computer program to treat these diverse problems has considerable advantages: first, the different applications provide complementary tests of the underlying physical model (the fundamental physics/physical chemistry is, of course, the same in all cases); second, many problems of current interest, such as flexible backbone protein design and protein-protein docking with backbone flexibility, involve a combination of the different optimization methods.
Prediction of Protein Structures and Interactions
Our advances over the past several years have led us to the goal of computing the structures of proteins and protein-protein complexes at near-atomic resolution. Our model of interatomic interactions has reached sufficient accuracy that native structures are almost always significantly lower in energy than nonnative structures, and hence the problem of predicting the structure of a protein (or a protein-protein complex) has become primarily a search problem: starting from an extended chain we must sample close enough to the native free-energy minimum for the energy to drop lower than for all the nonnative conformations generated. To achieve adequate sampling, we have developed a multiscale search algorithm that begins with a broad low-resolution search over a wide range of conformations. It then shifts to a search using a detailed high-resolution representation for tightly packed conformations in which buried polar groups form hydrogen bonds to compensate for the loss of interactions with water.
Because of the large number of possible conformations for a protein chain, finding the lowest energy state is a formidable computational challenge, even with our improvements in the search algorithm. We developed a distributed computing project, called Rosetta@home, to meet this challenge. There are now more than 135,000 participants worldwide whose computers run Rosetta structure prediction and design calculations when not otherwise being used. The project has sparked considerable interest in biomedical research. Inspired by this, we are working with high school teachers to develop a minicurriculum for students that will explain the science around Rosetta@home. We are also developing a multiplayer, interactive video game version of Rosetta@home that we believe will be an excellent vehicle for learning and, by allowing people to work with each other and with their computers, may allow solution of difficult scientific problems.
Our improvements in Rosetta, together with the large computing power of Rosetta@home, have made it possible to predict, in some cases, the structures of small proteins with near-atomic resolution, and to accurately predict the structures of protein-protein complexes from the structures of the isolated proteins in cases where there are not significant backbone conformational changes upon complex formation. Particularly encouraging after years of work on high-resolution modeling are the close to atomic-resolution predictions of the structures of complexes in CAPRI (Figure 1), the 1.5-Å de novo prediction in CASP6 (Figure 2), and the close agreement of the Top7 (Figure 3, right)and protein-protein interface design models (Figure 4, left) with the x-ray crystal structures. These results suggest that high-resolution modeling is starting to work.
A recent highlight was the prediction of the structure of a CASP7 (critical assessment of structural prediction) challenge with sufficiently high accuracy that the x-ray crystallographic phase problem could be solved with the model. We are now applying the Rosetta structure prediction methodology to proteins for which experimental phasing of diffraction data has proved challenging, and building detailed models based on experimental electron density maps from cryoelectron microscopy. In the protein-protein–docking area, we have developed methods for efficient modeling of large symmetric complexes, and for incorporating full backbone flexibility into the search. In collaboration with others, we are using these new methods, now incorporated into Rosetta, to build models of the amyloid fibrils associated with many human diseases.
Design of New Protein Functions
Several years ago, we developed a general computational strategy for designing new protein structures that incorporates full backbone flexibility into rotamer-based sequence optimization. This was accomplished by integrating ab initio protein structure prediction, atomic-level energy refinement, and sequence design in Rosetta. The procedure was used to design Top7, a 93-residue protein with a novel sequence and topology. Top7 was found to be folded and highly stable, and the x-ray crystal structure of Top7 is strikingly similar (RMSD = 1.2 Å; see right panel of Figure 3) to the design model. The design of a new globular protein structure and the close correspondence of the crystal structure to the design model have broad implications for protein design and protein-structure prediction and open the door to the exploration of the large regions of the protein universe not yet observed in nature.
Since the validation of our protein design methodology provided by Top7, we have focused on designing proteins with new and useful functions. We are concentrating on four challenges: (1) the design of new protein-protein interactions, (2) the design of new enzymes catalyzing reactions not catalyzed by naturally occurring enzymes, (3) the design of novel endonucleases with any specified cleavage specificity, and (4) the design of a vaccine for HIV.
Protein-Protein Interactions
We showed several years ago that computational protein design with Rosetta could be used to reprogram the specificity of protein-protein interactions. More recently, we have developed methodology for designing specific protein interaction surfaces into proteins that do not normally interact. We have succeeded in designing new interactions that form both in vitro and inside cells, and we are working to generate tight binding inhibitors for proteins exposed on the surface of pathogens.
Design of Novel Enzymes
To design enzymes capable of catalyzing any arbitrary chemical reaction, we have developed algorithms that rapidly identify protein scaffolds on which an ideal active site for the reaction under consideration can be built. These ideal active sites consist of the transition state for the chemical reaction and amino acid side-chain functional groups positioned so as to optimize catalysis; we use a combination of quantum mechanical calculations and chemical principles to obtain these sites. We then use the Rosetta protein design methodology to design the surrounding protein side chains to optimize binding to the transition-state model. Thus far, we have used this approach to design enzymes that catalyze the Kemp elimination reaction and a retroaldol condensation reaction. The rate enhancements in both cases are ~5 x 104-fold, modest compared to those of naturally occurring enzymes, and we are working to improve the design methodology to create more active catalysts, and to design new enzymes for a broad range of medical, chemical, and energy-related applications.
Design of Endonucleases with New DNA-Cleavage Specificites
In the past several years, we have extended the Rosetta protein design methodology to protein-RNA and protein-DNA interfaces and shown that new, highly specific endonucleases can be created by redesign of the extended DNA-binding interface in homing endonucleases. We are developing this methodology and designing new endonucleases that cleave within therapeutically important sites. For gene therapy applications, for example, we are designing endonucleases that cleave near the sites of mutations that cause disease; our collaborators will then experiment with correcting mutations in these genes through homologous recombination by introducing the designed endonuclease and a wild-type copy of the gene into mutant cells.
Design of a Vaccine for HIV
An effective immune response against HIV is thwarted by the high rate of mutation of the viral coat protein surface. There are a few regions of the virus surface that cannot mutate because they are required for cell recognition and entry, but only in a small number of individuals are antibodies produced that bind to these regions. Our approach is to design small proteins that are structural and sequence mimics of these Achilles' heel regions on the virus coat, and in collaboration with many research groups throughout the country, test these designed immunogens to see if they elicit antibody responses that neutralize the virus.
Improvement of Physical Model
Our approach to improving energy functions involves a combination of quantum chemistry calculations on simple model compounds, traditional molecular mechanics approaches, and protein structural analysis with feedback from our prediction and design tests. We have used such an approach to develop an improved hydrogen-bonding potential; a notable result is that the orientation dependence of the hydrogen bond in quantum chemistry calculations on formamide dimers is remarkably similar to that seen in side-chain–side-chain hydrogen bonds in protein structures but different from that in current molecular mechanics force fields, which neglect the covalent character of the hydrogen bond. Feedback from the prediction and design calculations has provided continual impetus and guidance for improving the energy function; for example, inadequacies in our treatment of protein-protein interactions have led to the recent development of a rotamer-based model for water-mediated hydrogen bonds.
Plans for the Future
We will continue to work to improve the physical model and the sampling methodology underlying the prediction and design calculations in Rosetta. On the prediction side, our goal is consistent near-atomic resolution structure prediction for small proteins; despite our progress there is considerable room for improvement in both the consistency and accuracy of the predictions. We also will work to develop general methods for producing high-resolution models from limited experimental data, and so broaden the range of biological processes that can be understood at atomic resolution. On the design side, we hope to develop new biomolecules with new functions—inhibitors, enzymes, endonucleases, and vaccines—that can have a positive impact on the world.