When people say that "a genome has been deciphered," they mean that it has been sequenced—that is, converted from a molecular form to a string of nucleotide symbols stored in a computer. Proper deciphering means finding all genes in the genome and describing their function and regulation. Traditional experimental approaches are slow and laborious, and it is clear that most genomes will never be studied in this way in any detail. It has become apparent that comparing many different genomes is a very powerful technique for genome annotation. It allows us to find genes and map their exact boundaries; predict the function of proteins they encode and the interaction between the encoded proteins; find regulatory sites in DNA and describe gene regulation; reconstruct metabolic and regulatory systems; and, eventually, given its genome, predict an organism's phenotype (metabolism, physiology, response to external stimuli, and changes in internal state).
Contrary to common misconceptions, comparative genomics approaches are not restricted to transferring information about known genes to similar, related genes in new genomes—that is just the first step. The existing techniques allow one to assign a function to families of genes containing no experimentally studied representatives, completely describe new regulatory systems, and identify and fill gaps in existing knowledge. In our studies, which HHMI has supported since 2000, we have been able to characterize a new type of regulatory structure, so-called riboswitches, which are able to bind directly to small ligands. The riboswitches are present in all three domains of life (Eukarya, Bacteria, and Archaea) and, as a remnant of the RNA world, might represent the oldest regulatory system.
Analysis of regulatory sites in conjunction with other comparative genomic techniques may lead to nontrivial functional predictions that are subsequently validated experimentally. For example, our analysis of one specific regulatory system, the ZUR regulon, allowed us to predict changes in the protein composition of bacterial ribosomes under conditions of zinc starvation. Similarly, we have predicted new enzymes involved in the biosynthesis of fatty acids, thiamine (vitamin B1), and cobalamin (vitamin B12). However, the main strength of this approach is in the analysis of transporters. These essential proteins are relatively difficult to study experimentally, and thus much less is known about them than about enzymes. An additional complication is that transporter specificity may change at a very fast rate, and thus the standard similarity-based methods of gene annotation often fail. However, when combined with the analysis of regulatory interactions, one may gain quite specific and reliable predictions. Already validated predictions include new transporters for methionine, riboflavin, and oligogalacturonides (products of pectin degradation by plant pathogens); dozens more have been published and await interested experimentalists.
In a recent collaboration with an experimental group from the Humboldt University in Berlin, Germany, we characterized a new type of transporter capable of both secondary (ATP-independent) and ATP-dependent transport of metal ions (nickel and cobalt), vitamins (biotin), and possibly other compounds. In these systems, one subunit is capable of secondary transport and, in many cases, appears in genomes as a single transporter. However, when joined by the ATPase subunit, it becomes more specific and effective.
Sometimes, bioinformatics analysis allows us to describe new regulatory systems in a very detailed manner. For example, starting with a new conserved motif upstream of ribonucleotide reductase genes in a diverse set of bacteria, we identified the transcription factor responsible, found other genes this factor regulated, demonstrated a link to the regulation of replication, and predicted the mechanism of regulation (repression by cooperative binding to tandem sites overlapping with promoters).
In a more theoretical area of research, we study the evolution of bacterial regulation on three different levels. At the gene level, we study the birth and death of individual binding sites, leading to expansion and contraction of regulons. At the protein level, we consider coevolution of regulatory proteins and their binding motifs in DNA. At the genome level, we try to understand the major events that shape extant complex systems: loss of regulators, changes of regulator specificity, introduction of new regulators, rewiring of regulatory cascades, and so forth. Using these approaches, we reconstructed the evolutionary history of transcription factors regulating iron homeostasis in alpha-proteobacteria (our colleagues at the University of East Anglia, United Kingdom, validated some of specific predictions made in that study) and described the evolution of T-box RNA structural elements regulating the metabolism of amino acids in Firmicutes. We are also interested in the evolution of transcription factor families—in particular, why some families consistently have a single representative per genome, others have several representatives, and yet others are completely absent from some genomes but sometimes burst into dozens of members. Our final aim is to create a theory of regulatory evolution that is at least as developed as the current models of evolution of genes and proteins.
Another area of our research is alternative splicing in eukaryotic genomes—that is, the process that generates multiple protein isoforms encoded by a single eukaryotic gene. We are testing the hypothesis that alternative splicing serves as a mechanism for generating diversity of proteins in evolution. Again, we have used several different approaches. First, we demonstrated that alternative exons and splice sites are much more likely to be lost or gained in evolution than constitutive ones, both in mammals (human, mouse, and dog) and in insects (two fruit flies and the malarial mosquito). Second, we showed that alternative regions, even if present in several genomes, evolve faster than constitutive ones; they accumulate more substitutions, both synonymous (retaining the encoded amino acid) and non-synonymous. Moreover, simultaneous analysis of the human and chimpanzee genomes and human polymorphisms demonstrated that alternative regions experience positive selection. Taken together, our observations imply that alternative splicing allows a gene to test new functions without sacrificing old ones. As more and more genomes from different taxonomic groups become available, we hope to be able to describe the evolution of alternative splicing in more detail; estimate the rates of exon, intron, and splice site loss and gain; and thus go deeper and with better resolution into an analysis of the evolutionary trends and functional consequences of alternative splicing.
Last updated September 2009