The past decade has seen a stunning increase in the accumulation of DNA sequences, resulting in the prediction of vast numbers of novel proteins. However, the determination of protein function remains difficult, because of the tremendous range of biochemical activities that proteins display, the diverse modifications that proteins undergo, the multiplicity of proteins potentially encoded by a single gene, and the use of single proteins for multiple purposes. Even in cases when the function of a protein is known, it is formidable to assess the consequences of amino acid changes on this function, but this is urgently needed information for physicians and patients in interpreting the rapidly growing data on human genetic variation.
Our laboratory is interested in developing biological technologies, especially for analyzing protein function. Often we use the unicellular eukaryote Saccharomyces cerevisiae (baker's yeast) as the host organism for carrying out protein assays. Yeast—the first eukaryote to be sequenced—has a relatively small number of genes and is highly tractable for experimentation. In addition, yeast is a convenient host to express proteins from other organisms, a property we take advantage of in analyzing heterologous proteins, including human ones.
For the past few years, we have focused on a method, termed deep mutational scanning, that couples protein display technology to high-throughput DNA sequencing. Protein display methods physically link proteins and the DNA sequences that encode them. When protein variants in such a method are put under a selection for function, those with beneficial features enrich in the population and those with deleterious features deplete. These changes in frequency can be determined by sequencing of the encoding DNAs. By comparing the frequencies of a given variant in the input and selected populations, we obtain a ratio that is an estimate of the variant's function.
Deep mutational scanning provides a quantitative measure of the function of hundreds of thousands of variants of a protein in a single experiment. The key ingredients of this approach—protein display, low-intensity selection, and high-throughput sequencing—are simple and widely available. Data from this approach can be used to construct protein sequence–function maps and to reveal fundamental protein properties. We are using this approach with such protein domains as a WW domain, viral and yeast RNA-binding domains, E3 ubiquitin ligases, a degradation signal known as a degron, and a G-protein-coupled receptor. We are also applying it to synonymous codons and to a tRNA.
In one application, we identified mutations that increase the thermodynamic stability of a WW domain. We took advantage of the idea that a stabilizing mutation may rescue a destabilizing mutation when the two are combined in a double mutant. By measuring the ability of thousands of variants of the WW domain to bind to a peptide ligand, we identified 15 candidate stabilizing mutations. Two of these mutations are highly stabilizing, indicating that systematic analysis of large-scale protein functional data can reveal fundamental physicochemical properties such as stability.
Organisms use a relatively small repertoire of RNA-binding domains, with specificity achieved by the spatial organization of these domains within a protein and by the small sequence variations among structurally related domains. We studied the effect of mutations on the function of a common RNA-binding domain called the RNA recognition motif (RRM), which is present in the poly(A)-binding protein of yeast. Data on the ability of variants of this protein to function in yeast have allowed us to identify critical residues, to define consensus sequences for the two RNA-binding motifs within the RRM, and to identify a site of protein-protein interaction.
Substrates destined for degradation by the ubiquitin proteasome system become modified with ubiquitin, which is attached by E1, E2, and E3 enzymes. E3 ubiquitin ligases specify the substrate and catalyze ubiquitin transfer from an E2 ubiquitin-conjugating enzyme. The mechanism by which E3 enzymes catalyze this transfer is not well understood. We scored the activity of ~100,000 variants of a domain of the mammalian E3 ligase Ube4b and found mutations that dramatically increase E3 ligase activity. Biochemical and structural analyses of these mutant proteins have provided mechanistic insight into elements of E3 catalysis. We are carrying out a similar approach with the human tumor suppressor protein BRCA1, in an effort to correlate biochemical activity with cancer risk.
A primary degradation signal of substrates that is recognized by E3 enzymes is known as a degron. We mapped the sequence–function relationship of a degron by fusing it to a yeast metabolic enzyme. The degron leads to the rapid degradation of the fusion and a failure of yeast to grow without the addition of a specific amino acid. We used this simple nutritional selection to identify mutations that affect the activity of the degron. This method has the potential to rapidly characterize the in vivo stability of proteins.
Although synonymous codons encode identical amino acids, variation in these codons can lead to subtle alterations in protein production. We generated a library of synonymous variants in the yeast HIS3 gene and are using the functional score for each synonymous variant to explore the relative contributions of factors such as mRNA secondary structure and codon usage bias to protein expression.
Transfer RNAs (tRNAs), critical for translating genetic information into cellular function,~must adopt a specific and conserved three-dimensional structure to interact with the ribosome, with elongation factors, and with the appropriate aminoacyl tRNA synthetases. To define the set of functional tRNAs and to determine the extent to which a particular tRNA can tolerate mutation, we are collaborating with the laboratories of Eric Phizicky and David Mathews (University of Rochester) to adapt deep mutational scanning to the study of tRNA function.
In other recent work, we have begun efforts in genome engineering in yeast. Our goal is to optimize pathways that lead to the production of a desired compound, such as a drug, or to a desired phenotype, such as resistance to high ethanol concentration in the media. By making numerous changes to the expression levels of genes in a pathway, we hope to identify strains with, for example, increased production of a drug or resistance to ethanol. Project aims include generating cells with tunable gene expression, improving the ability to replace wild-type sequences with multiple variant sequences, transforming strains with a collection of mutagenized transcription factors, and engineering sensors to report the abundance of specific compounds.
Some of this research was supported by grants from the National Institutes of Health.
As of March 22, 2013