Our laboratory develops and applies new approaches to chemical and biological discovery that are driven by the principles underlying biological evolution. Over the past five years we initiated two new lines of research. The first aims to precisely manipulate information flow in mammalian cells through the use of synthetic regulatory elements (SREs), proteins, and nucleic acids that modify genes and gene products with tailor-made specificities. Achieving this ambitious goal requires addressing two long-standing challenges in the molecular life sciences: (1) the development of a system for the continuous directed evolution of proteins and nucleic acids, and (2) the development of a general platform for the delivery of macromolecules into mammalian cells in vitro and in vivo. My group's research efforts over the past five years resulted in new solutions to both challenges.
The second new line of research seeks to discover cellular nucleic acids with novel structures and functions. This research resulted in the discovery of cofactor-linked RNAs, geranylated RNAs, and a new DNA nucleotide. In a third line of research, we continue to pioneer the development and application of DNA-templated synthesis (DTS), resulting in the discovery of new small-molecule inhibitors of disease-associated proteins and three new chemical reactions over the past five years. A sample of research progress (not comprehensive) over the past five years in each of these three areas is summarized below.
The Creation and Use of Synthetic Regulatory Elements in Mammalian Cells
Our emerging understanding of living systems has made increasingly realistic the possibility of manipulating cellular information flow by precisely altering genome or proteome structure, engineering new conditions under which genes are expressed, or creating new information-processing circuits that interface exogenous molecules with genes. The purposeful manipulation of information flow in living systems has been a long-standing goal of the molecular life sciences, and if applied to humans could lead to breakthrough diagnostics and therapeutics that respond in sophisticated ways to disease. Two daunting challenges that must be overcome before this vision can be realized are (1) how to efficiently create SREs that alter genomes, transcriptomes, or proteomes, and (2) how to deliver macromolecules into mammalian cells in vitro and in vivo in ways that preserve their activities and minimize undesired side effects.
A system for the continuous directed evolution of proteins and nucleic acids. Macromolecule engineering and directed evolution efforts in our laboratory and in other groups have resulted in a number of successes, including recombinases that target pathogen genomes, synthetic riboswitches that respond to small molecules, ligand-dependent inteins that render protein activities dependent on a cell-permeable small molecule, and nucleases with tailor-made DNA specificities. Such examples highlight the difficulty of generating these macromolecules, as each required several person-years of effort. To realize a vision in which arbitrary genes, transcripts, or proteins are manipulated at will in mammalian cells, we must develop much more rapid and effective approaches to generating SREs.
Conventional directed evolution involves discrete cycles of mutagenesis, transformation or in vitro expression, screening or selection, and gene harvesting. Although successful evolution is strongly dependent on the total number of rounds performed, the labor- and time-intensive nature of this process limits laboratory evolution efforts to a modest number of rounds. Continuous directed evolution, in which mutation, transformation, selection, and replication take place constantly, without requiring researcher intervention, has the potential to enhance dramatically the speed and effectiveness of laboratory evolution. Continuous evolution in the laboratory, however, has been previously implemented only in one landmark example by Gerald Joyce and his colleagues (Scripps Research Institute), who used a method to evolve ribozymes that cannot be easily adapted to other biomolecules.
We harnessed the life cycle of filamentous bacteriophage to develop a system for the continuous directed evolution of proteins and nucleic acids. In phage-assisted continuous evolution (PACE), Escherichia coli host cells continuously flow through a "lagoon" containing an actively evolving population of phage DNA vectors encoding the genes of interest. During PACE, the desired activity is linked to the production of a protein required for the production of infectious progeny phage containing the evolving genes. Due to the speed of the phage life cycle, PACE can mediate ~40 complete rounds of evolution per day, a 100-fold increase in speed over conventional methods. PACE requires no intervention during evolution and obviates the need to create DNA libraries, transform cells, or extract genes during each round.
We used PACE to continuously evolve three new activities in a protein enzyme. Although the starting enzyme exhibits undetectable levels of activity in two of these cases, enzymes with at least wild-type levels of each target activity emerged in only 1–8 days of PACE, corresponding to 45–200 rounds of evolution. By accelerating directed evolution ~100-fold, PACE may provide solutions to otherwise intractable directed evolution problems and address novel questions about molecular evolution.
Supercharged proteins as a general macromolecule delivery platform. The development of a general platform to deliver macromolecules into mammalian cells in vitro and in vivo would be a key step toward the application of SREs in living systems and would also address the major challenge facing the use of proteins and nucleic acids as intracellular probes and therapeutic agents.
We discovered that "supercharging" proteins—replacing their surface-exposed residues with Lys/Arg or with Glu/Asp—can impart immunity to aggregation without abolishing their native fold or function. We also found that superpositively charged proteins, such as +36 GFP (an engineered green fluorescent protein with a theoretical net charge of +36), potently enter mammalian cells, including four cell lines resistant to traditional transfection methods. Supercharged GFPs also delivered small interfering RNA (siRNA) and plasmid DNA potently and without cytotoxicity in these cells, resulting in siRNA-based gene silencing or plasmid-based gene expression. +36 GFP also rapidly delivered proteins, including active, nonendosomal ubiquitin and Cre recombinase, into all five cell lines tested, with up to 100-fold greater potency than the currently used protein transduction domains (PTDs) Tat, Arg10, and penetratin. We have also gained insights into the mechanism of cell entry and the mechanistic basis of the macromolecule delivery potency of supercharged proteins.
We recently discovered a substantial class of natural proteins with similar cell penetration and macromolecule delivery properties both in vitro and in vivo. Members of this class of proteins may represent a more immunologically tolerated platform for in vivo protein delivery than other macromolecular agents. Moreover, the ability of many of these proteins to penetrate cells has not been previously reported, and our findings raise the possibility that this substantial class of natural proteins may play undiscovered biological roles that arise from their previously unknown ability to penetrate cells.
A Broad Search for New Structures and Functions in Cellular Nucleic Acids
ver the past few decades, RNA has emerged as much more than an intermediary in biology's central dogma, and many RNAs are now known to play a wide range of catalytic, regulatory, or defensive roles in the cell. In contrast to its broad functional diversity, the known chemical diversity of biological RNA has remained limited primarily to canonical ribonucleotides, 3'-aminoacylated tRNAs, modified nucleobases, and 5'-capped mRNAs. This disparity between RNA's known functional and chemical diversity, coupled with the powerful properties of synthetic small-molecule–nucleic acid conjugates, led us to speculate that small-molecule–RNA conjugates beyond those previously described may exist in cells as evolutionary fossils or as RNAs with novel functions enabled by their modifications.
Discovery of cellular CoA-RNA, NAD-RNA, geranylated RNA, and a new DNA nucleotide. We developed and implemented three increasingly general chemical screens to discover cellular small-molecule–RNA conjugates. In contrast with previous efforts, our methods do not depend on any specific type of small-molecule structure or any particular biological function of the conjugate. Once the structures of the small molecules are solved, we isolate the attached RNA(s) by using chemoselective capture or by testing specific RNAs that we hypothesize may be linked to the small molecules.
Using this approach, we identified coenzyme A (CoA) and several CoA thioesters (acetyl-, malonyl-, and methylmalonyl-CoA) as covalent conjugates to cellular RNA in E. coli and Streptomyces venezuelae. Our experiments indicate that the CoA-derived RNAs are under ~200 nucleotides in length, contain CoA at their 5' ends, and are probably not generated by aberrant transcriptional initiation. We are using thiol capture and high-throughput sequencing to identify these RNAs as the next step toward understanding their biological roles.
Our methods have also revealed NAD-linked RNA, which we discovered in E. coli, S. venezuelae, Bacillus subtilis, Enterobacter aerogenes, Vibrio fischeri, and bovine RNA. NAD-RNA is surprisingly abundant in cells at a level of ~3,300 copies per E. coli cell, comparable to the total number of mRNA molecules in an E. coli cell. The most recent small molecule-RNA conjugates discovered with our approach are two lipid-like nucleotide derivatives, which we isolated from a variety of organisms. These small molecule-RNA conjugates include the first cofactor-linked RNA isolated from cells. More generally, our findings reveal that the chemical diversity of biological RNA in modern cells is greater than previously understood.
In collaboration with Anjana Rao (Harvard Medical School), we elucidated the structure of a new DNA nucleotide generated in mammalian cells upon overexpression of TET1, an enzyme that Rao and coworkers hypothesized may be a methylcytosine-modifying enzyme. Using the comparative mass spectrometry (MS) screening methods that we developed to study RNA, we discovered that this novel DNA nucleotide is 5-hydroxymethylcytosine (hmC), suggesting that TET1 may play a role in epigenetic regulation of the mammalian genome by mediating the oxidation of methylcytosine to hmC, a potential key step in the demethylation of 5-methylcytosine.
Components of Evolution Applied to Synthetic Molecules via DNA-Templated Synthesis
everal years ago we developed a new approach to the synthesis and discovery of small molecules that combines powerful aspects of natural biosynthesis and evolution with the flexibility of synthetic organic chemistry. We discovered that DNA hybridization induces the rapid, general, and sequence-programmed reaction of DNA-linked substrates, enabling DNA-templated organic synthesis (DTS). By translating DNA sequences into synthetic molecules, DTS enables selection, amplification, and mutation to be applied to molecules that can only be accessed through chemical synthesis. Over the past five years we performed the first large-scale, systematic application of DTS and in vitro selection, resulting in the discovery of many bioactive synthetic molecules and three new chemical reactions.
Discovery of bioactive macrocycles from DNA-templated synthesis and selection. Integrating many of our recent developments in DTS, we translated a library of DNA templates in one solution into a corresponding library of ~13,824 DNA-linked macrocycles. We subjected this macrocycle library to highly efficient in vitro selections for binding to several dozen proteins of biomedical interest. The selection results identified macrocycles that inhibit kinases implicated in human cancers and a metalloprotease implicated in diabetes. Two families of macrocyclic Src inhibitors (IC50 < 1 µM) exhibit remarkable selectivity for Src over other human kinases. We improved the potency of this class of Src inhibitors to IC50 ≤1 nM and confirmed that the resulting compounds can inhibit Src in human cells. These results established the use of DNA-templated synthesis and in vitro selection to discover small molecules that modulate biological activities, an approach that has since been industrialized. In a collaborative effort with Professor Markus Seeliger (SUNY Stonybrook), the X-ray structure of Src kinase complexed with one of these macrocyclic inhibitors was solved, revealing detailed molecular insights into the macrocycles’ binding, inhibition, and unusual specificity.
DNA-encoded reaction discovery. In the past five years we developed a second-generation DNA-encoded reaction discovery system and used it to discover two new reactions, validating a non-hypothesis-driven approach to reaction discovery that has now been adopted by some synthetic chemistry groups. The first reaction is a mild Au(III)- or triflic acid–catalyzed hydroarylation of olefins with indoles. The second is a visible light-triggered, Ru(II)-catalyzed reduction of azides to amines that has a remarkable functional group tolerance. This azide reduction reaction can be performed on nucleic acid and carbohydrate substrates and in the presence of protein enzymes, without modifying any groups present on these macromolecules beyond the azide. This reaction provides a means of photouncaging a widely used chemical handle in a manner that is compatible with biomolecules.
RDPCR andIDPCR. We developed two replacements for conventional in vitro selections, reactivity-dependent PCR (RDPCR) and interaction-dependent PCR (IDPCR), that have transformed the way we use DNA-linked molecules. RDPCR directly links bond formation with the selective PCR of DNA encoding reactive library members without the need for substrate immobilization or solid-phase capture and elution steps. Reaction discovery using RDPCR resulted in a new Pd(II)-catalyzed allene-azide coupling reaction that generates azidolactones, useful intermediates that would otherwise be more challenging to synthesize. Likewise, IDPCR replaces solid-phase binding selections and enables for the first time productive ligand-target combinations to be identified in a single experiment from combined libraries of small-molecule ligands and targets.