Genomics and biogenesis. Our lab was among those to first report the abundance of microRNAs (miRNAs) in animals and the presence of miRNAs in plants. With help from collaborators, we went on to identify many of the miRNAs of model organisms (including Caenorhabditis elegans, fruit flies, Arabidopsis, and mice) and showed that humans have hundreds of miRNA genes. We also found miRNAs and other classes of small regulatory RNAs in deeply branching animals and land plants.
The genes that encode miRNAs produce primary transcripts that fold back on themselves to form distinctive hairpin structures from which the miRNAs are processed. While analyzing data from high-throughput sequencing of small RNAs, we discovered types of miRNA precursors that bypass the specialized machinery that generates most miRNAs. In addition, we developed protocols to confidently identify the precursor hairpins that are recognized by the conventional machinery to give rise to miRNAs. We have since turned our attention to how this recognition occurs, i.e., how the cellular machinery determines which of the many hairpin-containing transcripts enter the miRNA biogenesis pathway. Using combinatorial and biochemical approaches, we have discovered that primary-sequence determinants help distinguish miRNA-containing transcripts from other types of hairpin-containing transcripts, and we have expanded the list of known structural determinants that are also recognized. With this menu of sequence and structural features, we can now reliably design miRNA genes de novo, without reference to a known gene.
Regulatory targets. The realization that genomes of animals and plants each have hundreds of miRNA genes raised the question of what all these tiny RNAs are doing. To address this question, we have developed increasingly reliable methods for predicting miRNA targets. In plants, we found that miRNAs have extensive pairing to their targets and that evolutionarily conserved targets are mostly genes that play important roles during development. In animals, we have found a few cases of extensive pairing, but miRNAs usually recognize shorter sites (typically only 7 or 8 nucleotides long) that match a short region of the miRNA containing the "seed" sequence. In collaboration with Christopher Burge (Massachusetts Institute of Technology), we showed that more than half of the human protein-coding genes have been under selective pressure to maintain pairing to miRNAs. When we also consider nonconserved targeting, the fraction of human genes regulated by miRNAs grows even higher.
Although a 7-nucleotide site matching a miRNA often mediates some repression, it is not always sufficient to do so, which indicates that other characteristics help specify targeting. For example, messenger RNA (mRNA) regions surrounding the sites can influence site accessibility and thus the efficacy of repression. We have combined such features of site context, as well as other features that make some miRNAs more effective than others, to construct models that predict site performance. In mammals, our model is as informative as the best high-throughput in vivo crosslinking approaches and thus provides an important resource for choosing which of the many miRNA–target relationships are most promising for experimental follow-up. Current predictions for mammals, zebrafish, flies, and nematodes can be viewed at www.TargetScan.org.
Regulatory effects and biological functions. To augment and inform our computational analyses of miRNA targeting, we have adapted high-throughput approaches to measure the effects of mammalian miRNAs on target mRNA levels, poly(A)-tail lengths, translational efficiency, and protein accumulation. Prior to these measurements, the prevailing view was that miRNAs mediate repression mostly through translational inhibition, with relatively little effect on target mRNA levels. However, our measurements showed that, although some translational repression can be detected, mammalian miRNAs predominantly act to decrease target mRNA levels. Messages from hundreds of genes are directly repressed by individual miRNAs, albeit each to a modest degree, indicating that for most interactions, miRNAs act as rheostats to adjust and optimize the mRNA output previously generated by transcription and pre-mRNA processing.
By disrupting the regulation of particular targets, several groups, including ours, have demonstrated the importance of miRNA-directed regulation during each stage of plant development. In addition, many groups have demonstrated the importance of miRNAs in animals. Indeed, our observation that most human genes are conserved targets of miRNAs indicates that it will be difficult to find a developmental process or disease that is not somehow influenced by miRNAs. We have contributed to the understanding of miRNA function during blood cell, brain, and skeletal development, as well as cancer. For example, disrupting the miRNA regulation of the mouse Hmga2 oncogene enhances oncogenic transformation. Because many human tumors possess defective HMGA2 genes that lack the miRNA complementary sites, our work indicates that losing miRNA regulation of this oncogene contributes to some human cancers.
Other Small Regulatory RNAs
In addition to new miRNAs, our analysis of small RNAs found in animals, fungi, and plants uncovered other classes of endogenous small RNAs. These include the 21U- and 26G-RNAs of nematodes, the heterochromatic small interfering RNAs (siRNAs) of fungi, and what are now recognized as trans-acting siRNAs of plants. Experimental follow-up, mostly by other labs, has shown that, like miRNAs, these each play important gene-regulatory roles in RNA-silencing pathways that have emerged in their respective evolutionary lineages.
RNA Interference in Budding Yeast
The miRNA pathways of plants and animals appear to have evolved independently, both as elaborations of a core RNA-silencing pathway known as RNA interference (RNAi). During RNAi, the Dicer endonuclease processes long double-stranded RNA into siRNAs, which are then loaded into the Argonaute protein, where they direct the cleavage of mRNA targets. This pathway is present in most eukaryotes, where it plays important roles in defending against viruses and transposons, but it was initially thought to be absent in budding yeast. In collaboration with Gerry Fink (Whitehead Institute and Massachusetts Institute of Technology), we found that although RNAi has been lost in a recent ancestor of Saccharomyces cerevisiae, it is present in other budding yeasts, including Saccharomyces castellii (a close relative of S. cerevisiae).
The discovery of RNAi in budding yeast has opened many new opportunities for exploring the mechanism, biology, and evolution of the pathway. We found that introducing S. castellii Dicer and Argonaute into S. cerevisiae restores RNAi in this species. The reconstituted pathway strongly silences endogenous retrotransposons, which explains why it has been retained in S. castellii. In addition, endogenous double-stranded RNA (dsRNA) elements, known as Killer elements, are not retained in the RNAi-reconstituted S. cerevisiae strain, which renders these strains susceptible to the toxin of strains that have Killer. These results provide an explanation for why the presence of RNAi is so variable in the budding yeast lineage: retaining the pathway provides defense against transposons, whereas losing the pathway enables acquisition and retention of Killer.
In collaboration with Dinshaw Patel (Memorial Sloan Kettering Cancer Center), whose lab solved the structures of the Dicer and Argonaute proteins, we have been studying the biochemical mechanism of RNAi in budding yeasts. We first determined the mechanism of the budding yeast Dicer (Dcr1), which differs from that of other Dicer proteins. Dcr1 dimers bind cooperatively along the dsRNA substrate and cleave at precise intervals determined by the distance between consecutive active sites. Thus, unlike canonical Dicers, which successively remove siRNA duplexes from the dsRNA termini, Dcr1 initiates processing in the interior and works outward. More recently, we have been studying the budding yeast Argonaute (Ago1), which resembles Argonaute proteins of animals and plants. The structure of this protein associated with its guide RNAs revealed how it positions these RNAs for target recognition and how a catalytic glutamine residue inserts into the active site to complete a previously unrecognized catalytic tetrad.
Messenger RNAs, Poly(A) Tails, and Developmental Transitions
MicroRNAs and other factors that mediate post-transcriptional gene regulation typically recognize elements in the 3′-untranslated regions (3′-UTRs) of mRNAs. Adding another dimension to this regulation is the use of alternative 3′-UTRs, which can either include or exclude the regulatory elements. To facilitate global analyses of these regulatory phenomena, we have developed a high-throughput method to accurately map the poly(A) sites of transcripts, which enables us to identify the ends of mRNAs and quantify the use of alternative isoforms in different cells, tissues, or developmental stages. We have applied this method to expand and correct the mRNA annotations of C. elegans, zebrafish, mice, and humans. These studies have substantially improved our prediction of miRNA targets in each of these species and have revealed the regulatory consequences of alternative 3′-UTRs in mammalian cells.
Maternal gene products deposited into metazoan eggs regulate embryonic development before the zygotic genome is activated. In plants, an analogous period of prolonged maternal control over embryogenesis was also thought to occur. Overturning this idea, we showed that the vast majority of Arabidopsis mRNAs are produced in near-equal amounts from both maternal and paternal alleles, even during the initial stages of embryogenesis.
In animals, we found a transition in translational control that follows the maternal-to-zygotic shift in transcriptional control. Key to this discovery was our development of a high-throughput method for measuring the lengths of poly(A) tails found at the ends of most eukaryotic mRNAs. When we used this method to measure tail lengths of millions of individual mRNAs isolated from early zebrafish, frog, or fly embryos, we found a very strong correlation between tail length and translational efficiency, as would be expected if mRNAs with longer tails were more efficiently translated. However, this strong coupling diminished at gastrulation and was absent in nonembryonic samples, which indicated a previously unrecognized developmental switch in the nature of translational control.
In vitro, some RNAs can form stable four-stranded structures known as G-quadruplexes, and these structures have been implicated in post-transcriptional gene regulation and diseases. However, direct evidence for quadruplex formation in cells has been lacking. We recently identified thousands of endogenous sequences that can fold into G-quadruplexes in vitro but showed that, in contrast to previous assumptions, these G-quadruplex regions were overwhelmingly unfolded in cells. Nonetheless, the same G-quadruplex regions that were unfolded in eukaryotic cells were folded when ectopically expressed in Escherichia coli. However, these regions impaired translation and growth, which helps explain why few sequences that could fold into G-quadruplexes were detected in the transcriptomes of E. coli or the two other bacteria examined. Thus, we hypothesize that eukaryotes have a robust and surprisingly effective machinery that globally unfolds RNA G-quadruplexes, whereas bacteria have instead undergone evolutionary depletion of G-quadruplex–forming sequences.
Long Noncoding RNAs
In humans and other mammals, thousands of genomic loci produce long intervening noncoding RNAs (lincRNAs), which resemble mRNAs yet do not encode proteins. To better understand the evolution of lincRNAs, we identified lincRNAs in zebrafish and C. elegans. Although our set of zebrafish lincRNAs share many characteristics with mammalian lincRNAs, only ~30 have detectable sequence similarity with putative mammalian orthologs, typically restricted to a single short region of high conservation. Nonetheless, other lincRNAs are transcribed from conserved genomic locations despite our inability to detect sequence conservation. We have been investigating the functions of some of the conserved lincRNAs, with particular interest in one that pairs to a miRNA in a pattern that is unusually extensive and remarkably conserved in vertebrate species.
Work on miRNAs and other RNAs is supported in part by grants from the National Institutes of Health.
As of January 23, 2017