Genomic duplication followed by adaptive mutation is considered one of the primary forces for evolution of new function. Duplicated sequences are also dynamic regions of rapid structural change during chromosome evolution. My long-term goal is to understand the evolution, pathology, and mechanism(s) of recent gene duplication and DNA transposition within the human genome. Our work involves the systematic discovery of these regions, the development of methods to assess their variation, the detection of signatures of rapid gene evolution, and ultimately the correlation of this genetic variation with phenotypic differences within and between species.
My research addresses a new paradigm that has emerged in the past few years regarding the dynamic nature of human genome structure. Particular chromosomal regions have been shown to be active in the acquisition, duplication, and dispersal of large gene-containing genomic segments. I hypothesize that these "jumping genomic segments," also known as segmental duplications, are part of an ongoing evolutionary process that results in a novel form of large-scale DNA variation and contributes to rapid primate gene evolution. At a structural level, duplications may be viewed as dynamic mutations—an initial event increases the probability of a second event. Sequence homology created as a result of duplication increases the probability of additional rounds of gene conversion, unequal crossing-over, and subsequent rearrangement. Not surprisingly, many of the largest blocks of sequence similarity generated by this process are substrates for recurrent chromosomal structural rearrangements associated with certain human diseases and disease susceptibility. Compared to unique nonfunctional or "neutral" DNA, these particular areas of the genome represent hot spots of evolutionary and contemporary change. Their impact on evolution and disease are only beginning to be understood. Our research falls into three broad categories.
Human Variation and Disease
The combined incidence of detected de novo rearrangements that are mediated by segmental duplications is estimated at 1/1,000 live births. This includes 3 percent of all birth defects where mental retardation is the primary diagnosis. We have identified ~130 regions of the human genome that we believe show a predilection to segmental aneusomy. Our paralogy map of the human genome therefore provides a "road map" to investigate regions with an increased probability of rearrangement. Children with undiagnosed mental retardation provide a sensitized background for the study of copy-number variation. One goal of our research is to assess the frequency of duplication-mediated segmental aneusomy within (1) the normal human population and (2) a population of patients with idiopathic mental retardation. Our aim is to address two fundamental questions: What is the nature and frequency of duplication-mediated structural polymorphisms within the human genome? Is there an excess of de novo events among children with mental retardation and congenital birth defects?
Our primary method for detection of variation in copy number is based on array comparative genomic hybridization (array CGH), using a well-characterized set of probes flanked by low-copy repeat sequences. As a second method, we have developed a computational approach based on the assessment of paired-end sequence against the reference genome. The latter has identified hundreds of sites of potential structural polymorphism, of which 82 encompass genes. I hypothesize that copy-number variation (deletion and duplication) is an underestimated mutational force contributing to genetic disease—particularly susceptibility loci. The characterization of this variation will provide the basis for developing the necessary assays to perform association studies with simple Mendelian and complex human genetic disease.
Phylogenetics and the Mechanism of Origin
As a complement to our understanding of human variation, we focus on understanding natural genomic variation between humans and other primates. Because of the limitations of assembled genome sequence, we employ computational tools we developed during the analysis of the human genome to characterize lineage-specific and shared duplications between humans and great apes. In addition to genome-wide analyses, targeted high-quality sequencing of specific regions will provide long-range continuity to model evolutionary processes within these regions. There are two objectives. We will reconstruct the evolutionary history of every recent (<40 million years) segmental duplication within the human genome. In collaboration with Pavel Pevzner (University of California, San Diego), we are developing computational methods to identify ancestral states based on outgroup genomic data and to extract historical associations by application of graph theory, which promises to deconvolute the subrepeat structure of mosaic duplications. Using comparative sequence from these regions, we are also modeling the frequency of gene conversion and its impact on the structure of these regions.
Our second objective will be to understand the underlying mechanism of segmental duplications. We have recently developed a donor-acceptor model for human duplications that indicates that Alu repeats are key elements for the mobilization of duplications, while low-complexity (GC- and AT-rich) sequences may account for the preferential integration of these elements into specific chromosomal regions. We propose to test this model directly by identifying and characterizing lineage-specific duplications within humans, chimpanzees, and gorillas. Studying the phylogenetic relationship of such sequences to their antecedents will provide fundamental insight into putative donor and acceptor sequences at the sites of transposition and integration, respectively. To date, in collaboration with Eric Green (National Human Genome Research Institute), we have cloned and mapped 12 of these new insertion sites within gorilla, chimpanzee, and orangutan. Large-scale sequence analyses of the integration sites suggest coordinated deletion of the insertion site during segmental duplication. Ultimately, these data will serve as the basis for future experimental modeling of this process.
Gene and Transcript Innovations
The process of segmental duplication provides a vehicle for primate gene innovation in two different ways. First, duplications may lead to the adaptive evolution of genes "liberated" from the selective constraints of ancestral function. Second, the accumulation of diverse duplications at prescribed locations in the genome juxtaposes different gene cassettes in novel genomic texts. This has led to the formation of "chimeric" transcripts in a process akin to "exon shuffling." Although most random mutations create duplicate pseudogenes, occasionally functional products may emerge. One highlight of our research has been the discovery of both rapidly evolving genes and fusion genes specific to the human and great ape lineages. The latter genes show a bias toward germline expression. We will extend this work to identify such novel gene products and compare expression profiles of each to their progenitor genes. Functionality will be determined by identifying signatures of either significant positive or purifying selection. Genes that show evidence of significant positive selection will be assessed for intraspecific variation as a test for evidence of a selective sweep through the population. The determination of the function of such genes in the absence of model organisms is a significant challenge and will remain my long-term objective.
My research program is committed to understanding the significance of human segmental duplications from the structural, genic, and phenotypic level. It is nontraditional, in that we work on some of the most biologically complex regions of the genome, which are not readily tractable by available genomic technologies. Furthermore, many evolutionary biologists are focused on understanding the significance of highly conserved genomic sequence among distantly related species. We strive to unravel the significance of regions undergoing rapid evolutionary change among closely related primates. Our research challenges the notion of a static genome that simply decays under a neutral model of evolution. Rather, the data implicate local dynamism, where hypermutable regions are nonrandomly distributed. A comprehensive assessment of this form of genetic variation will forge new links between evolutionary biology and human genetic disease.
My research philosophy combines various disciplines (evolutionary, human genetics/genomics, and bioinformatics) to understand the mechanisms and consequences of novel forms of variation in the human genome. Such a synergism of various disciplines provides a powerful strategy to address biological processes of genome evolution. The development of tools and the conditions required to pursue such a holistic approach, with respect to studies of genome evolution, are unprecedented. With the advent of large-scale comparative sequencing and the integration of experimental and computational genomic approaches, such multifaceted research objectives have become increasingly tractable endeavors. My overall goal is to contribute to this new era of genomic science as it applies to evolution and medicine and to impart the value of this scientific design, through teaching and mentorship, to the next generation of scientists.