Gene duplication followed by adaptive mutation is considered one of the primary forces for the evolution of new gene function. Duplicated sequences are also dynamic regions of rapid structural change as a result of unequal crossover. Human and great ape genomes contain more complex interspersed segmental duplications than the genomes of other mammals. We have shown that this relatively unusual aspect of great ape organization sensitizes our genome to a high rate of structural mutation and copy-number variation, predisposing our species to neurodevelopmental disease as a result of genomic imbalance. I hypothesize that this increased burden is offset by the emergence of novel genes and increased genetic plasticity that are important in the evolutionary adaptation of humans and great apes. I divide my research program into three interdependent lines of investigation: (1) the evolutionary origin and consequence of duplicated sequences with respect to primate gene innovation; (2) the pattern of natural human genetic variation and the forces that have shaped it, such as selection and conversion; and (3) the role of structural variation as a model for understanding the genetic basis of neurocognitive disease and autism spectrum disorders (ASD) more broadly in the human species.
Comparative sequencing and analyses of primate genomes have shown that segmental duplications have accumulated nonrandomly in both time and space. We estimate that the rate of segmental duplication accumulation was approximately three- to fourfold higher prior to the divergence of humans and great apes and that this rate declined after speciation. We have developed a framework to reconstruct the history of segmental duplications over the past 40 million years of evolution. Our results indicate that a set of "core duplicons" were the focal point for the emergence of the interspersed duplication architecture. These segments encode human/great ape–specific gene families (e.g., NPIP, TBC1D3, LRRC37, NPBF, RANGP2, and GLP). The functions of these gene families are largely unknown, although some families have been generally referred to as neuro-oncogenes and may be important in cell proliferation. At the periphery of these cores, the juxtaposition of younger duplicated sequences has led to the formation of truncated and chimeric transcripts in a process akin to "exon shuffling." We are focusing on understanding the function of 31 human-specific gene families—some of which appear to contribute to unique aspects of human brain development (e.g., SRGAP2, ARHGAP11B). We hypothesize that the expansion and structural changes of these newly minted gene families at novel locations has been selectively advantageous—a counterbalance to the increased copy-number-variant (CNV) burden.
Human Genetic Variation
We are developing novel computational and experimental methods that use second- and third-generation sequencing technologies to completely understand patterns of genetic variation in duplicated sequences. We have two objectives. First, accurate genotyping of duplicated genes will allow us to associate genetic variation with human phenotypes, providing direct insight into the function of the approximately 1,000 genes mapping within segmental duplications. Second, a detailed understanding of long-range and local patterns of genetic variation will allow us to assess the mutational and selective forces that have shaped these regions. Large-scale sequence analyses of different structural haplotypes have provided evidence for partial selective sweeps, an intimate association of large-scale inversions, and duplicative transposition and long-range nonallelic gene conversion events focused around the core duplicons. Most of this higher-order pattern of human genetic variation, however, has yet to be elucidated because it associates with dynamic multicopy alleles that have been problematic to assay.
Neurocognitive Disease and Autism
We have used the human duplication architecture to predict hot spots of genomic instability, leading to the discovery of numerous microdeletions and microduplications associated with intellectual disability (ID) and ASD. We estimate that 14 percent of childhood developmental delay is caused by rare CNVs, with more than one-third of these mutations mediated by segmental duplications. Our findings have emphasized the importance of dosage imbalance and the potential role for multiple rare disruptive mutations as a model for these diseases. In contrast to sporadic microdeletions associated with syndromes, we have discovered a significant co-occurrence of two or more large CNVs when phenotypic outcome is more variable.. We propose that this multiple-hit model may explain the comorbidity of neuropsychiatric and neurodevelopmental diseases in specific families. Moreover, less penetrant CNVs are more likely to be transmitted from mothers in contrast to de novo mutations, which arise primarily within the male lineage. Full realization requires that the complete spectrum of genetic variation be understood—including single-nucleotide polymorphisms, insertions/deletions, and CNVs. We are continuing to develop methods to detect smaller CNVs (a few hundred base pairs), inversions, and smaller genomic hot spots in patient samples and to integrate this information with recurrent loss-of-function mutations in both unique and duplicated genes. Our initial analyses have already led to the discovery of more than a dozen new ASD/ID genes associated with specific networks associated with chromatin remodeling, synaptic function, and cell proliferation (e.g., CHD8, ADNP, DYRK1A and TBR1). Over the next few years, we plan to systematically assess the full spectrum of genetic variation in more than 20,000 ASD/ID patients and controls through collaboration with families, clinicians, and researchers to develop a comprehensive model to explain the genetic architecture of these diseases. Our goal is to understand the relationship between copy-number variation, single-nucleotide variation, and diverse clinical outcomes, including epilepsy, ASD, schizophrenia, and childhood developmental delay. We hypothesize that integrating these data will provide further insight into the unique neurocognitive adaptations that have arisen in our species.
My research program is committed to understanding the significance of human segmental duplications at the structural, genic, and phenotypic levels. We posit that genomic structural changes (deletions, duplications, and inversions) have contributed disproportionately to both human disease and adaptive evolution in the human species. The work has suggested a model whereby dosage imbalance of multiple genes underlies neurocognitive diseases such as autism and where the increased susceptibility to copy number variation is linked to the emergence of novel genes important in human neurocognitive adaptation. This program is nontraditional, in that we work on some of the most biologically complex regions of the genome, which are not readily tractable by routine genomic technologies. My research philosophy combines various disciplines (evolution, human genetics/genomics, and computational biology) to understand the mechanisms and consequences of novel forms of complex human genetic variation. Such a synergism provides a powerful strategy to address the evolution and disease predilection of the human species. The development of tools and the conditions required to pursue such a holistic approach are unprecedented in studies of genome evolution. With the advent of large-scale comparative sequencing and the integration of experimental and computational genomic approaches, such multifaceted research objectives have become increasingly feasible endeavors. My overall goal is to contribute to this new era of genomic science as it applies to evolution and medicine and to impart the value of this scientific design, through teaching and mentorship, to the next generation of scientists.
Grants from the National Institutes of Health, Simons Foundation Autism Research Initiative (SFARI), and Paul G. Allen Family Foundation provided support for these projects.
As of May 21, 2015