RNA has many important functions in the cell, from the transfer of genetic information to the catalysis of reactions that are essential for gene expression. Often it is the folded tertiary structure of an RNA, rather than its primary sequence, that is essential for function. Given the ubiquitous role of folded RNA in biology, we have set out to understand some basic principles: What are the building blocks for RNA tertiary structure and how is it energetically stabilized? What are the pathways for reaching the folded state? How are RNA and ribonucleoprotein assemblies unfolded by helicase enzymes? How can computational and experimental tools be combined to better understand these problems?
Many of our studies focus on the self-splicing group II intron, a ribozyme that catalyzes its own excision from precursor mRNA and has been shown to be a mobile genetic element that invades duplex DNA. The biological role and biotechnological applications of group II introns are of great interest. They are also excellent models for studying RNA folding because their tertiary structure consists of modular domains that assemble through the formation of novel, non-Watson-Crick interactions. Using a combination of biophysical and chemogenetic techniques, we are identifying the molecular structure of these interactions, determining the free energy for their formation, and monitoring their appearance along the RNA-folding pathway.
Our previous work has involved identifying functional domains of the intron, dissecting them into separate modules, and then reassembling these as active ribozymes that recapitulate reactions important in self-splicing. We then developed kinetic frameworks for interpreting the enzymatic behavior of these ribozymes and used chemical modifications to identify the atoms that are important for folding and for catalysis. It is now possible to put the puzzle pieces back together again at the molecular level, determining which pairs of essential atoms interact with each other and which tertiary interactions form at specific points in time. To this end, we have exploited chemogenetic techniques such as nucleotide analog interference suppression (NAIS). By combining this approach with logic that is similar to that of classical genetics, we have identified pairs of atoms that participate in tertiary interactions. From sets of these interactions, we can deduce the molecular structure of interacting RNA motifs. The advantage of this approach, particularly when it is accompanied by data from short-range photo-cross-linking, is that each interaction is known to be functional, its energetic signature is known, and its role in the mechanism has been specified by the type of selective pressure applied to the system.
In this way, we have identified a diversity of motifs that mediate interaction between domain 5 (widely considered the catalytic heart of the intron), domain 3 (a catalytic enhancer), domain 1 (the active-site scaffold), and domain 6 (which contains the branchpoint nucleophile for splicing). For example, the λ-λ' interaction creates an extended minor-groove triple helix between domain 5 and single-stranded nucleotides in domain 1. This triplex aligns the 5' splice site and positions it precisely over the active-site functional groups that participate in the catalysis of phosphodiester cleavage. By combining NAIS data with constraints derived from phylogenetic and photo-cross-linking studies, we have constructed a complete three-dimensional model of a functional group II intron core, and we have shown that group II introns use a single active site to catalyze both steps of splicing. These studies have been complemented by metal ion–mapping experiments and nuclear magnetic resonance (NMR) studies that have identified the location of metal ions that are critical for folding and catalysis by the intron. We are currently learning how other substructures nucleate around the central core and how these structures change as a function of time along the folding pathway or during splicing. These biochemical approaches are complemented by the application of standard techniques in structural biology, by which we are elucidating group II intron structural features at high resolution. For example, we recently solved the solution structure of domain 5 by NMR, which not only provided molecular details of a group II intron active site but also revealed a novel RNA motif in its core.
As interesting as the three-dimensional structure of a molecule may be, it is just as important to understand how a particular three-dimensional conformation is achieved. For this reason, we are studying the group II intron-folding pathway. Unlike the folding of other RNAs, which is often limited by the presence of kinetic traps and the formation of misfolded intermediates, group II introns fold directly to the native state through a simple two-state mechanism. This finding was extremely surprising, given that group II introns are among the largest ribozymes (~1,000 nucleotides) in nature, and their architecture is highly complex. How group II intron folding might impact on biological function was highlighted by recent work on the Av intron, a new group II intron from the soil bacterium Azotobacter vinelandii (Av). The Av intron is capable of splicing efficiently only at high temperature, which is notable because it lies in a heat-shock gene. This may imply that the stability of a folded RNA could be used as a switch for regulating the heat-shock response. These and other studies of group II introns have revealed new paradigms for RNA folding that promise to reveal more about the capabilities of RNA as a folded polymer and biomolecular building block.
RNA metabolism requires that RNA folding be coordinated with RNA unfolding as a function of time or the presence of biochemical signals. We have been studying the mechanism of RNA unwinding by a class of helicases (the DExH/D subgroup of helicase superfamily 2) that are involved in all aspects of RNA metabolism and in viral replication. Our work has focused on two helicases that are essential for the replication of vaccinia and hepatitis C (HCV) viruses. These helicases serve as model systems for defining the behavior of DExH/D proteins in general, and more specifically, they are important drug targets in the effort to develop antiviral therapeutics. Using transient kinetics and chemogenetic approaches, we have examined the mechanism by which these proteins translocate, unwind nucleic acid, and strip proteins from RNA strands.
By linking the extent of RNA unwinding with the utilization of ATP, we have shown for the first time that DExH/D proteins are true molecular motors that act on RNA. We find that the NPH-II protein from vaccinia is a processive, directional motor that unwinds RNA with a kinetic step size of 6 base pairs and a quantifiable translocation rate constant that depends on the nature of the metal ion cofactor. Remarkably, the processivity of NPH-II (the tendency of the enzyme to proceed along the polymer rather than falling off) increases as ATP concentration increases, demonstrating that the binding of ATP, and not just the conformational changes that occur upon hydrolysis, can be important for processive motor function. By examining NPH-II unwinding of chemically modified RNA substrates, we have shown that the protein tracks exclusively on the sugar-phosphate backbone of the substrate-loading strand while stripping away the top strand without regard to its chemical identity. Other types of objects can also be displaced from the loading strand, as we have demonstrated that NPH-II can actively displace proteins from RNA in an ATP-dependent manner. This establishes the precedent for a new enzymatic activity (termed RNPase function) that is widely hypothesized to be important for the function of macromolecular machines such as the spliceosome. In these and other experiments, the NPH-II helicase has been an invaluable model system for demonstrating the numerous capabilities of proteins in the DExH/D family.
The NS3 helicase from hepatitis C virus has also been important in understanding the unwinding activity of DExH/D proteins. Despite its obligate role in cytoplasmic RNA replication by the virus, we find that NS3 is a highly efficient DNA helicase, with RNA helicase activity that must be enhanced by the addition of cofactor proteins. Phylogenetic analysis indicates that the DNA-unwinding activity of NS3 is not vestigial and has been specifically acquired by the virus, presumably because it confers some form of selective advantage. Because chronic HCV infection is associated with the development of hepatocellular cancer, robust DNA helicase activity by NS3 has important implications for long-term consequences of HCV infection and for the development of effective therapeutic strategies.
In addition to its role in human disease, the HCV NS3 helicase is a valuable system for studying helicase function in the context of large macromolecular machines. NS3 is active alone and in complex with other components of the replication machinery that modulate its behavior (such as the 4A cofactor and the 5B polymerase). The conventional methodologies for deconvoluting the helicase mechanism are inapplicable for the deduction of the molecular mechanisms of NS3 helicase activity, its role in complex contexts, and its various polymer specificities. We have therefore developed novel combinatorial approaches for monitoring helicase function. First, we introduce random nicks in the top strand of long helicase substrates and then we initiate synchronized unwinding of the substrate pool in a quench-flow apparatus. The resultant time courses reveal the relative speed and processivity of the helicase (and/or its complexes) at each nucleotide on a substrate. In this way, we have evaluated the relative unwinding efficiency for hundreds of different duplex lengths simultaneously in the same reaction. Like the cytoskeletal motor proteins moving along protein filaments, the NS3 helicase moves discontinuously along RNA polymers, undergoing periodic cycles of pausing and rapid unwinding with defined rate constants and step sizes. We have extended these findings through single-molecule studies performed in collaboration with the laboratory of Carlos Bustamante (HHMI, University of California, Berkeley). Together these combinatorial and single-molecule studies of RNA unwinding by NS3 have provided the first high-resolution glimpses of helicase motion, they have challenged basic assumptions about nucleic acid unwinding, and they have highlighted the many parallels between nucleic acid motors and cytoskeletal motors such as kinesin.
Experimental approaches in our laboratory are complemented by computational studies. For example, we have collaborated with Barry Honig (HHMI, Columbia University) to develop and implement new methods for calculating the electrostatic properties of RNA. Like the conventional GRASP program that is routinely used for calculating properties of proteins, these new methods now make it possible to calculate the electrostatic surface potentials of RNA structures and to identify metal-binding sites. Unexpectedly, we observe deep holes in the electrostatic potential contours of RNA molecules. The cavities occur at known sites of RNA-RNA interaction and may therefore minimize electrostatic repulsion between interacting RNAs.
Independently, we have developed robust methods for predicting macromolecular interactions from primary sequence data (a computational two-hybrid approach). We have also developed programs for calculating and describing the conformational states of RNA molecules. This was achieved by reducing the dimensionality of RNA conformational space from seven backbone torsion angles to two pseudotorsions, resulting in a graphical representation that is much like the analogous Ramachandran plot for protein conformation. When pseudotorsions are plotted against each other, one observes clusters of regions that correspond to specific structural motifs. This convention is now widely used as an automated approach (AMIGOS) for analyzing new structures to determine regions of particularly unusual structure or regions that require further refinement. The technique has been adapted to create a rapid search engine for scanning large RNA structures to classify constituent motifs and to identify conformational changes within RNA structures (PRIMOS). We have used an extension of this approach to discover novel RNA structural motifs; the resultant program (COMPADRES) is the first automated algorithm for identifying and characterizing new elements of RNA structure. (For download information and tutorials, see http://www.pylelab.org.)