Posttranscriptional processing of primary gene transcripts (pre-mRNA) is the major source of transcriptome and proteome diversity necessary to establish cell type- and developmental stage-specific gene expression patterns, and to adapt to environmental stimuli. Protein-coding messenger RNAs (mRNAs) in eukaryotes are produced from pre-mRNAs (historically called hnRNAs) by extensive processing, including removal of introns by splicing and 3'-end cleavage and polyadenylation. Splicing and cleavage and polyadenylation are specified by splice sites and polyadenylation signals (PASs) that can be used in various combinations to produce differentially regulated mRNAs encoding many protein isoforms from the same gene. Notably, it has recently been recognized that cryptic PASs throughout the gene pose a major threat to transcriptome integrity and must be silenced to prevent destructive premature cleavage and polyadenylation (PCPA).
Our research concerns the two major classes of cellular components that mediate pre-mRNA processing, pre-mRNA protection, and mRNA functions – RNA-binding proteins (RBPs) and small nuclear ribonucleoproteins (snRNPs). RBPs have fundamental roles in gene expression and cell physiology because all RNAs in cells exist and function as RNA-protein complexes (ribonucleoproteins or RNPs). snRNPs, the RNP complexes of noncoding small nuclear RNAs (snRNAs), are the major subunits of spliceosomes and are crucial for pre-mRNA processing. Our current studies focus on several aspects of RBPs' and snRNPs' functions and their disease applications.
RBPs, hnRNPs, and a Pathway from Transcription Sites to Ribosomes
As they are transcribed, nascent gene transcripts become associated with large amounts of proteins, called hnRNP proteins. The resulting chromatin-attached hnRNP fibrils can give active genes a prominent appearance, noted by 19th century cytologists as "lampbrush" chromosomes in amphibian oocytes and "puffs" in dipteran polytene chromosomes. By the mid-late 20th century it became clear that mRNAs are processed from pre-mRNAs within these hnRNP fibrils. However, the fragility of these large macromolecular structures and shortcomings in experimental methods hampered identification of hnRNP proteins and their functions.
In earlier work, we used ultraviolet (UV) light to photoactivate RNAs and cross-link them to bound proteins in living cells, and purified pre-mRNAs and mRNAs under protein-denaturing conditions, and thus isolated only proteins that were cross-linked to RNA. Since this predated sensitive protein mass spectrometry, we immunized mice with the cross-linked material and generated monoclonal antibodies to many RBPs – allowing us to identify and clone the principal hnRNP and mRNP proteins (>20), and leading to the discovery of the major RNA-binding motifs RNP consensus (RBD/RRM), KH domain, and RGG domain. Genomic sequences located these motifs in an enormous assortment of RBPs (>5 percent of human genes) involved in every aspect of RNA biogenesis and function, and linking their perturbations to many diseases. This connection emerged from identification of the fragile X mental retardation syndrome (FMR1) protein as a (KH-domain) RNA-binding protein, an activity we showed is impaired by a patient's mutation.
Biochemical studies showed that each hnRNP and mRNP protein has a distinct RNA-binding specificity, which drives assembly of a specific constellation of RBPs on each pre-mRNA, sculpting its presentation to the processing machineries and determining the mRNA it produces and its fate (Figure 1). In cytological experiments, we visualized the binding of hnRNP proteins to pre-mRNAs at chromosomal transcription sites and observed that many hnRNP proteins remain bound through splicing and transport to the cytoplasm, where, as we proposed, they also have functions in mRNA translation, localization, and stability. These studies described nucleocytoplasmic hnRNP shuttling, later generalized to numerous proteins, and traced a coordinated mRNA biogenesis path from gene to mRNA translation on ribosomes. We have delineated novel nuclear import and export signals, as well as the transport receptors (transportins) that choreograph RNP trafficking.
A major challenge is to understand how cells form a specific RNP on each RNA from their vast assortment of RBPs and RNAs, and how their perturbations cause numerous diseases.
SMN Complex: RNA-Protein Chaperone, snRNP Assembly, Spinal Muscular Atrophy
A remarkable case of specific RNP assembly is the construction of a heptameric Sm protein ring (Sm core) on each spliceosomal snRNA (U1, U2, U4, U5, U11, U12, U4atac), as it requires mobilizing the seven Sm RBPs to a short nonunique RNA segment exclusively on snRNAs. Sm cores are essential for snRNPs' functions, and their assembly is a key step in snRNP biogenesis. We have established that Sm core assembly is mediated by the SMN (survival of motor neurons) complex, a multiprotein complex identified in our laboratory. The SMN complex functions as an RNA-protein chaperone, a first-of-its-kind RNP assembly device. This was unexpected, as RNPs were previously believed to form by self-assembly. Indeed, Sm proteins have the propensity to nonspecifically assemble Sm cores on RNAs, which undoubtedly would be deleterious (Figure 2).
Using biochemical assays and inhibitors identified by high-throughput screening, we have dissected subunits (composed of SMN and Gemins) and intermediates in a stepwise snRNP biogenesis pathway. The atomic resolution structure of a key intermediate provided important mechanistic insights, revealing how cells distinguish snRNAs from other RNA classes and how illicit Sm core assembly is prevented, and explaining the basis of several mutations that cause spinal muscular atrophy (SMA) (Figure 3).
The SMN complex is required in all eukaryotic cells, but SMN deficiency causes SMA. This devastating motor neuron degenerative disease is the leading hereditary cause of infant mortality. Our studies on the SMN complex are motivated by its central role both in RNA metabolism and in SMA. We aim to advance understanding of the pathogenesis and prospects of therapy for this disease. This includes molecular profiling of SMA motor neurons, detailed studies of the structure and mechanism of the SMN-complex function, and discovery of drug targets to increase SMN and SMN-complex activity in the cells of SMA patients.
U1 snRNP Protects Pre-mRNAs from Premature Termination and Determines mRNA Length
Stimulated by the observation of altered snRNP repertoire and splicing abnormalities in SMA cells, we systematically inactivated individual snRNPs and probed for their transcriptome effects. Despite their 1:1 stoichiometry in the spliceosome, snRNPs are not equimolar in cells; it was noted, but unexplained, since snRNAs were first discovered in the 1960s, that U1 snRNA is much more abundant than other snRNAs in human cells. This led us to the surprising observation that U1 snRNP (U1) inhibition caused premature termination in the majority of pre-mRNAs. We found that in addition to and separate from its function in 5' splice-site recognition, the first step in splicing, U1 is a suppressor of PCPA from cryptic PASs in introns. This activity, which we have termed telescripting, allows transcription to go farther and is necessary for full gene-length transcription. Effective telescripting depends on additional U1s to bind and protect large introns, providing a plausible explanation for U1 overabundance (Figure 4).
U1 telescripting is a crucial and overarching process in gene expression. It also plays a role in regulating mRNA length, by affecting alternative PAS selection in 3' untranslated regions (3' UTRs). The wide-ranging impact of U1 telescripting in biology opened many areas of investigation, including its potential role in the widespread mRNA 3' UTR shortening in cancer, proliferating cells, and activated immune cells and neurons. A major focus of our lab is on understanding the mechanism by which U1 silences PASs.
The research in this laboratory was supported in part by the Association Française contre les Myopathies (AFM) and the National Institute of General Medical Sciences.
As of March 21, 2016