In virtually all organisms, the genome encodes the information needed to interpret itself. The DNA-binding and macromolecular assembly activities that control transcription are embedded in the protein sequences, which themselves are read through a series of protein- and RNA-mediated RNA recognition events. Genome sequencing has underscored the complexity of vertebrate genome function and evolution in this regard. First, there are an estimated 1,500 sequence-specific DNA-binding transcription factors in all vertebrates, and a comparable number of proteins with RNA-related activities. Second, there are approximately one million conserved non-exonic elements in the human genome, representing the majority of the human genome sequence under selection, and many of these elements are likely to be cis-regulatory. In addition, there are many known regulatory elements that are not conserved, consistent with the prevailing view that changes in gene regulation are a major mechanism in adaptation and evolution of species. Given that most apparent transcription factors and the vast majority of potential cis-regulatory sequences are functionally uncharacterized, understanding how the vertebrate “regulome” functions and evolves is a major aspect of understanding how the vertebrate genome relates to the biology of the organism. Fortunately, these genomic observations provide a starting point for both empirical and computational approaches to describing how genomes function on a global basis to orchestrate biology at the molecular and cellular level.
Research in my laboratory encompasses development of techniques, reagents, and datasets for genomic research; data mining; and exploration of gene functions and gene regulation in organisms including yeast, Drosophila, and vertebrates. For example, our yeast tet-promoter strain collection has been distributed to dozens of labs and individual strains have been sent to hundreds of investigators. Following our initial inference that many uncharacterized yeast genes are involved in RNA processing, we created a microarray technique to rapidly assess mutants for RNA processing defects. Our mRNA profiling datasets have demonstrated a broader relationship between gene function and gene expression than was previously appreciated and have revealed functions for uncharacterized genes. For instance, our initial survey of the expression of both known and predicted genes in mouse identified over one thousand uncharacterized or novel genes with expression patterns diagnostic of specific functions. In the same dataset, we identified putative new transcription factors expressed specifically in embryonic stem (ES) cells and early developmental stages, and we recently demonstrated that one of them, Zfp206, controls ES cell differentiation and gene expression, including expression of other apparent ES-cell-specific regulators.
With the availability of genome sequences for diverse vertebrates, my group has become interested in how the evolution of vertebrate gene expression patterns corresponds to evolution of the vertebrate genome and to properties of vertebrate organisms. As an initial approach we are planning to generate microarray profiles across similar, related, and distinct panels of tissues for known and predicted genes in a spectrum of organisms including fish, birds, and amphibians, using a single established protocol to maximize the validity of comparisons. We anticipate that with these data in hand it will be possible to address a variety of questions: What is the general conservation of expression over the last 500 million years? How informative are expression patterns for distinguishing orthologues from paralogues, in cases where sequence is unable to do so? Do lineage-specific expression patterns correspond to lineage-specific biological features and lineage specific cis- and trans-regulators? Can we infer lineage-specific functions of individual cis- and trans-acting transcriptional regulators, even if they are not lineage-specific? Can we also infer the evolutionary origins and path of individual regulatory mechanisms? How much additional information that cannot be gained from comparative genomics do the gene expression data from diverse species provide? How, generally, do the DNA-binding specificities and physiological functions of transcription factors evolve over time? The answers to these questions will impact not only how we think about evolution, but also how comparative genomics is used to more precisely infer the human transcriptional regulatory network.
Work in my laboratory is supported by the Canadian Institutes of Health Research, Genome Canada, the Ontario Genomics Institute, the Canadian Institutes of Advanced Research, the National Institutes of Health, and the Ontario Research Fund.
Last updated July 2010