A new analysis of gene duplication across the human genome reveals far more variation in gene copy number than anticipated.

When scientists announced in 2003 that they had finished the Human Genome Project, they were quick to clarify that sequencing of the full human genome not yet complete. As much as six percent of the genome was beyond the reach of available technology, leaving regions on the 23 pairs of human chromosomes unsequenced.

For the last seven years, Howard Hughes Medical Institute researcher Evan Eichler has been working on part of that unfinished business. It turns out that one of the reasons the genome sequencers could not deliver a fully sequenced and annotated genome is that about five percent of the human genome contains duplicated genes. Eichler, who is one of the world’s foremost experts on gene duplication, has showed that the human genome changes constantly—and duplicate sequences are among the fastest evolving regions.

We’re not talking a couple of base pairs being different. We’re talking wholesale loss or gain of an entire gene.

Evan E. Eichler

Now Eichler, who is at the University of Washington School of Medicine, has teamed with graduate students Jacob Kitzman and Peter Sudmant and colleagues from the University of Washington and Agilent Technologies Inc. to sort out much of that duplication. In an article published in the October 28, 2010, issue of Science, Eichler’s team provides a new highly detailed analysis of gene duplication across the human genome. By analyzing duplicate genes in the genomes of 159 people, including some from the Yoruba people of Nigeria, Utah residents with ancestors from northern and western Europe, Japanese people, and Han Chinese, the researchers uncovered wide variation in gene copy number both within groups and between groups – far more than anticipated. The variability, Eichler said, affects many genes that play a critical role in brain development. It also helps tell the story of human evolution, and may provide important clues into disease development.

Although humans have two copies of most genes in their genome, there are many exceptions to this rule. In some cases, individuals can have dozens of copies of a single gene scattered across multiple chromosomes. Researchers knew that such gene duplication existed, but until a year ago, it was impossible to distinguish one gene copy from another, or even to tell how many copies of a given gene existed in a person’s DNA.

The results of the new study, Eichler said, are eye-opening. “I knew these areas were variable, but to be able to see it, it was almost as if a veil was lifted. We’re not talking a couple of base pairs being different. We’re talking wholesale loss or gain of an entire gene,” he said. “The landscape is very complex and there is a lot more diversity here than I would have anticipated.”

The dramatic variation revealed in the study echoes that found in another large-scale analysis of the human genome, published in the journal Nature on October 28, 2010, by the 1000 Genomes Consortium, of which Eichler is a member. That analysis, a pilot study for larger project, includes data from over 800 people and highlights around 16 million single basepair DNA variations, many of which were previously unknown.

To analyze gene duplications, Eichler’s team developed an algorithm that can distinguish a meaningful tune from random noise in the genetic tumult. They used the algorithm to identify how many copies of a gene a genome contains and where those copies live. The genome analysis took note of genes that are duplicated as many as 48 times. It also determined whether a gene had disappeared completely.

“The data, I think, are pretty cool,” Eichler said. “It’s the first time you can look at these gene families and say what their copy number is, and how much variation there is between individuals and population groups.” The difference in copy numbers between populations may signal important adaptations to specific environments.

By comparing the pattern of human gene copy number to gene copy numbers in gorillas, chimpanzees, and orangutans, Eichler’s team also revealed differences between humans and great apes that had arisen during evolution.

Eichler noted that many of the genes’ whose copy numbers had changed—whether through evolution or across populations—are important in immune response or neural development. He says the findings also raise some interesting questions about the evolution of human neurobiology. He and his colleagues found that many genes involved in brain function differed in copy number between humans and apes. These include genes important in neuronal migration such as SRGAP2, a receptor for the neurochemical dopamine—a reward signal in the brain, as well as genes implicated in visual-spatial deficits, genes responsible for abnormal head size, genes connected to social relationship deficits, and genes that play a role in intellectual disability and epilepsy.

As part of their study, the researchers identified 4.1 million unique markers that can be used like tags to quickly identify duplicate genes and determine where they live on the genome. “We’ve shown a way into neglected parts of the genome. It will get our hands around how to study genetic variations of these regions,” Eichler said.

Scientist Profiles

For More Information

Jim Keeley 301.215.8858