 |
Technologies for Protein Analysis

Summary: Stanley Fields develops biological assays to analyze the function of proteins, often using the yeast Saccharomyces cerevisiae as a model organism for assays that can be applied to proteins from any organism.
The past decade has seen a profusion of whole-genome sequences, with total DNA sequence accumulation in GenBank now well over 100 billion bases. Genome sequences have led to the prediction of large complements of proteins, ranging from a few thousand in bacterial species to more than 20,000 for humans and other mammalian species. However, the determination of protein function remains a difficult task, given the tremendous range of biochemical activities that proteins display, the diverse modifications that a protein can undergo during its lifetime, the multiplicity of proteins potentially encoded by a single gene, and the use of proteins for more than a single function. It is also difficult to assess the functional consequences of amino acid changes in proteins, even in cases when a function is known.
Our laboratory is interested in developing biological technologies, especially those to analyze protein function. For many of our efforts, we use the unicellular eukaryote Saccharomyces cerevisiae (baker's yeast) as the host organism for carrying out protein assays. Yeast—the first eukaryote to be sequenced—has a relatively small number of genes, is highly tractable for experimentation, and has been used to derive numerous sets of reagents and high-throughput data. As a consequence, the set of yeast proteins is particularly advantageous for testing new technologies. In addition, yeast is a convenient host to express proteins from many other organisms, and we have taken advantage of this property to analyze several sets of such heterologous proteins.
For the past few years, we have focused on a method, termed deep mutational scanning, that couples protein display technology to high-throughput DNA sequencing. Protein display methods physically link proteins and the DNA sequences that encode them. When protein variants in such a method are put under a selection for function, those with beneficial features enrich in the population and those with deleterious features become depleted. These changes in frequency can be determined by sequencing of the encoding DNAs. By comparing the frequency of a given variant in a selected population to its frequency in the input library, we obtain a ratio that is an estimate of the variant's function. Deep mutational scanning thus provides the function, or fitness, of hundreds of thousands of variants of a protein in a single experiment.
The key ingredients of this approach—protein display, low-intensity selection, and highly accurate, high-throughput sequencing—are simple and becoming widely available. Deep mutational scanning data can be used to construct protein sequence-function maps and to reveal fundamental protein properties.
Identifying Mutations That Stabilize Proteins Enhancing protein stability is often critical for industrial and pharmaceutical applications. Stabilizing mutations permit acquisition of other, destabilizing mutations that improve function. This phenomenon can be observed as epistasis, where multiple mutations combine with unpredictable fitness effects. We identified stabilizing mutations in a WW domain based solely on parallel measurement of the fitness of 47,000 variants to bind to a peptide ligand and subsequent calculation of >5,000 epistasis scores. We introduced an epistasis-based metric, "partner potentiation," that identified 15 candidate stabilizing mutations. In collaboration with the laboratory of Jeffery Kelly (Scripps Research Institute), we tested six novel candidates by thermal denaturation and found two highly stabilizing mutations, one more stabilizing than any previously known mutation in this protein. Thus, systematic analysis of large-scale protein fitness data can reveal fundamental physicochemical properties such as stability.
Analysis of an RNA-Recognition Motif Throughout its life, an RNA molecule associates with diverse RNA-binding proteins that regulate its processing and function. RNA-binding proteins use a relatively small repertoire of RNA-binding domains to fulfill their function, with specificity achieved by the spatial organization of the RNA-binding domains within a single RNA-binding protein and by the small sequence variations between structurally related domains. We are studying the effects of sequence variations on the function of a common RNA-binding domain called the RNA recognition motif (RRM), which is present in the poly(A)-binding protein (Pab1) of yeast. In our system, the endogenous PAB1 gene is shut off and yeast cells become reliant on the performance of a plasmid-borne mutant PAB1 for growth. Sequencing of the library of variants before and after the shutoff allows us to assess the effect of mutations. Data on ~1 million variants have allowed us to identify functionally important residues that were previously unknown and to define function-based consensus sequences for the two RNA-binding motifs within the RRM. We are also using this data set to identify mutations that cause an unexpected fitness effect when combined with other mutations in a single variant.
Binding to RNA by the HIV-1 Tat Protein The HIV-1 Tat protein is integral to the viral life cycle, as it can induce efficient transcription of the virus by binding to a folded element of the HIV long terminal repeat called TAR. Previous studies have elucidated the effects of some mutations of Tat, but the overall depth and density of the studied mutations are low. By creating a library of hundreds of thousands of variants of Tat and selecting for binding to TAR using a yeast three-hybrid assay, we are examining the relationship between the sequence of Tat and its TAR-binding function. The Tat-TAR interaction is thought to be driven by an enrichment of basic residues in the core of the protein rather than a specific amino acid sequence, but it is not known if point mutations outside of this core region can affect the TAR interaction. Analysis of mutations that affect the affinity of Tat to TAR can contribute to our understanding of protein-RNA interactions, as well as the mechanism of HIV transcription and activation.
Catalysis by an E3 Ubiquitin Ligase The ubiquitin proteasome system governs most of the regulated proteolysis in eukaryotes. Substrates destined for proteasomal degradation are often modified with ubiquitin, which is attached to these substrates by a series of enzymes called E1, E2, and E3. E3 ubiquitin ligases specify the substrate and catalyze the transfer from an E2 ubiquitin-conjugating enzyme to that substrate. Though E3 enzymes are a large and well-studied class of proteins, little is known about how they catalyze this transfer. We constructed a library of ~1 million variants of the U-box domain of the mammalian E3 ligase Ube4b displayed on the surface of T7 phage. With the addition of E1 and E2 enzymes, a tagged version of ubiquitin, and ATP, the phage displaying Ube4b catalyzed auto-ubiquitination. The tag on the ubiquitin allowed us to select enzymatically active versions of Ube4b. We found that mutations at a single residue of the E3 can lead to dramatically increased E3 ligase activity. We are carrying out biochemical and structural analyses of these mutant proteins to gain insight into the elements of E3 catalysis.
Activity of a Protein Degradation Signal A primary degradation signal of substrates that is recognized by E3 enzymes is known as a degron. We designed a strategy to map the sequence-function relationship of a known degron by combining a simple genetic tool with high-throughput sequencing. Our system is based on the fact that yeast cells that express the URA3 gene grow in the absence of uracil, but die in the presence of 5-fluoro-orotic acid (5-FOA) because the Ura3 enzyme converts 5-FOA to a toxin. We can alter the stability of the Ura3 enzyme by fusing it to a degron that leads to rapid degradation, and thus alter the uracil-dependent growth and 5-FOA sensitivity of the yeast cells. We optimized this system using a well-characterized degradation signal, Deg1 from the Matα2 protein, fused to Ura3. To query mutations in the degron for their ability to stabilize or destabilize Ura3, we replaced the wild-type Deg1 sequence with a library of mutations in the N-terminal 33–amino acid region of Deg1. By comparing the number of times each degron mutant appears in the input pool versus in selected pools, we can determine how mutations affect the activity of the degron. This simple technique is also being applied to other biological questions that revolve around protein stability.
Synonymous Variation and Fitness in a Yeast Model Gene Although synonymous codons encode identical amino acids, variation in a gene's synonymous codons can lead to subtle alterations in protein production and exert significant phenotypic effects. To investigate factors important to synonymous codon usage and protein production, we are using as a model the yeast HIS3 gene, which encodes an enzyme required for the synthesis of histidine. After constructing a plasmid library of synonymous HIS3 variants, we allow yeast cells carrying the variants to compete for their ability to grow in media lacking histidine. Cells in a population that carry HIS3 variants with beneficial synonymous changes increase in the population based upon the competitive fitness of their HIS3 genes, whereas cells carrying deleterious variants decrease. Sequence analysis of the DNA plasmids from population samples grown under histidine selection allows us to calculate enrichment scores for each synonymous variant. With these data, we are exploring the relative fitness contributions of factors such as mRNA secondary structure and codon usage bias.
Enrich: Software for Analysis of Protein Function by Enrichment and Depletion of Variants We developed Enrich, a tool for analyzing deep mutational scanning data. Enrich identifies all unique variants (mutants) of a protein in high-throughput sequencing data sets and can correct for sequencing errors using overlapping paired-end reads. Enrich uses the frequency of each variant before and after selection to calculate an enrichment ratio, which is used to estimate fitness. Enrich provides an interactive interface to guide users. It generates user-accessible output for downstream analyses as well as several visualizations of the effects of mutation on function, thereby allowing the user to rapidly quantify and comprehend sequence-function relationships. Enrich is implemented in Python and is available under a FreeBSD license. Enrich includes detailed documentation as well as a small example data set.
Some of this research was supported by grants from the National Institutes of Health.
As of May 30, 2012
|
 |
|
 |