HomeResearchDesign of New Functional Proteins

Our Scientists

Design of New Functional Proteins

Research Summary

David Baker's research group focuses on designing protein-based therapeutics, nanomaterials, and catalysts to address problems in medicine and engineering. The group is also developing methods for solving macromolecular structures using sparse experimental data sets. 

The exquisite functions of naturally occurring proteins solve the challenges confronting biological evolution. Humans face challenges today they did not face during natural evolution. Our research aims to design synthetic proteins to address these challenges.

Figure 1: Screenshot of a FoldIt introductory level...

Proteins fold to their lowest free energy states, so designing new proteins requires finding amino acid sequences whose lowest energy states are the desired structures. To design synthetic proteins and to predict the structures of naturally occurring proteins, we have been developing a computer program called Rosetta. At its core is a model for the energies of interactions within and between molecules and algorithms for finding the lowest energy structure for an amino acid sequence (protein-structure prediction) and the lowest energy amino acid sequence for a given structure (protein design). Feedback from the prediction and design tests is used continually to improve the energy functions and the search algorithms.

Our work is enabled by two advances in modern technology. First, computer power has steadily increased for decades, which is critical for our compute-intensive calculations; second, the genomics revolution has led to high-throughput and inexpensive methods for synthesizing DNA. The design efforts described below have two steps: (1) Rosetta modeling calculations to identify amino acid sequences predicted to have the desired structure and function, and (2) synthesis of DNA encoding these designed amino acid sequences, followed by introduction of the synthetic genes into bacteria and production of the designed proteins. With advances in computing and DNA synthesis, and our Rosetta methodology, we can now design and test dozens of novel proteins a week.

Design of Novel Structures
It has been known for over 50 years that protein structures are determined by their amino acid sequences; thus, in principle it should be possible to design amino acid sequences not found in nature that fold up to new structures. We have recently made considerable progress with this problem. We have arrived at general principles for constructing new protein structures and developed a robust approach for designing amino acid sequences that fold to structures consistent with these principles. This approach has been used to design many new protein structures; experimental characterization has shown that they are exceptionally stable and have structures nearly identical to the design models. Armed with this ability to create proteins with atomic-level control over the details of structure, we are tackling a variety of problems of current importance.

Design of Therapeutics and Vaccines
We are using protein design methods to design a new class of protein therapeutics and vaccines. Most drugs are small molecules that are relatively cheap to produce, but because of their small size it is difficult for them to block the protein-protein interactions that underlie many diseases. A newer class of drugs—antibody-based therapeutics—can block protein-protein interactions but are much more difficult and expensive to produce. We are designing proteins—intermediate in size between small-molecule and antibody-based drugs—with the highly specific interaction-blocking properties of antibodies but the production advantage of smaller molecules. These small proteins bind with high affinity and specificity to functional sites on target proteins. We have designed small proteins that bind to the influenza virus surface protein and prevent the virus from infecting cells. They are now being developed as potential anti-flu therapeutics in collaboration with a pharmaceutical company.

We are also designing small proteins to block other pathogens and toxins and to block human proteins that contribute to autoimmune disease and cancer. In addition, we are exploring the use of designed proteins for third-world diagnostics for which the lower cost of production is important. We are also designing proteins that mimic key regions on pathogens that could form the basis for a next generation of vaccines.

Design of Enzyme Catalysts
Naturally occurring enzymes are proficient catalysts. We have developed methods for designing protein catalysts for, in principle, any arbitrary chemical reaction and are using them to design novel catalysts for a number of chemical reactions, including several not catalyzed by naturally occurring enzymes, and a new route to carbon fixation. These designed enzymes have considerably lower activity than naturally occurring enzymes; we are working to understand the origins of the activity differences and to design more active catalysts.

Design of Self-Assembling Protein-Based Nanomaterials
Familiar materials such as silk and wool are made from proteins, and naturally occurring protein nanostructures such as viral capsids and cytoskeletal filaments illustrate the vast diversity of shapes and functions accessible to self-assembling proteins. We are developing methods to design proteins that self-assemble into regular polyhedral cages for drug delivery and vaccine presentation and into two- and three-dimensional lattices for nanoscale devices such as sensors. An advantage of designed self-assembling proteins over lithography for patterning at the nanometer-length scale is the potential for atomic-level control over structure and chemistry.

Protein Structure Prediction and Determination
Since proteins fold to their lowest energy state, in principle it should be possible to compute the structures of proteins from their amino acid sequences. The problem is the very large number of configurations accessible to a protein chain. Over the past 10 years, we have made steady progress in predicting protein structure from amino acid sequence alone, but for proteins with over 100 amino acids it remains a difficult problem. Fortunately, the search for the lowest energy state can be made easier using additional information that is frequently available. If the structure of an evolutionary homologue has already been determined, information from this structure can be used to guide the search; using such information in Rosetta, we can generate very accurate structure models.

Structural biology is increasingly focused on the structures of biological machines and other assemblies that do not form the highly ordered states necessary for standard structure determination using x-ray crystallography. It is often possible to collect low-resolution x-ray diffraction data or low-resolution electron density maps using cryoelectron microscopy. Standard methods have difficulty generating accurate models with limited data, but we have found such sparse data sets useful in guiding Rosetta searches for the lowest energy structure. We have also developed methods for determining NMR structures using the limited data sets that can be collected for larger proteins. We are now focused on solving hybrid modeling problems where sparse data are available from a number of sources (cryoelectron microscopy, homologous structures, NMR) by using these data to guide Rosetta structure prediction calculations.

Involving the Public: Rosetta@home and FoldIt
Solving difficult scientific problems can require focusing all available resources. We believe the general public is a powerful and greatly underappreciated resource. A number of years ago, it became clear that our structure prediction efforts were limited by the number of alternative conformations that could be sampled using our in house computing cluster. To solve this problem, we developed a distributing project called Rosetta@home, which sends our prediction and design calculations to volunteers in the general public and collects their results. Today, Rosetta@home has 350,000 volunteers and a computing power of over 100 teraflops. We are using Rosetta@home not only to predict protein structures but also for protein design—for example, the final step before we synthesize a gene encoding a designed amino acid sequence is to use Rosetta@home to determine whether it folds to the designed structure.

Suggestions by Rosetta@home volunteers led to a new way to engage the general public in research. When Rosetta@home runs on a computer, a screensaver appears showing the course of the prediction or design calculation. After watching the proteins folding up on their computers for some time, participants wrote in saying they thought they could improve on what the computer was doing and asked for a way to guide the calculations. To enlist human intuition in our prediction and design efforts, we teamed up with researchers in the University of Washington computer science department and developed an online multiplayer folding and design game called FoldIt. FoldIt players combine intuition-based manipulation of the structure and sequence of a protein with Rosetta structure and sequence optimization algorithms, collaborate with other players on their team, and compete with other teams to obtain the highest score (the Rosetta calculated energy, multiplied by -10 to make it more like a standard computer game score where higher is better). In the past two years, FoldIt players have demonstrated that game players can contribute to scientific discovery: they have solved protein structures, developed new algorithms, and designed new proteins.

We are excited to have channeled not only the computing power but also the brain power of people all over the world to solve biomedical research problems. Rosetta@home and FoldIt change the traditional relation between scientists and society by allowing anybody with access to a computer to contribute. We believe there are tremendous opportunities for education as well, as there is no better way to understand how science works than to be involved in a scientific project; more concretely, there is no better way to understand proteins than to directly grapple with them.

As of February 27, 2013

Scientist Profile

University of Washington
Biochemistry, Computational Biology