For 10 years, scientists knew that a severe form of microcephaly, an inherited brain malformation, was due to a mutation on chromosome 19. Christopher A. Walsh and other researchers had even narrowed the search for the mutation to a particular stretch of the chromosome. But the section was long and dense, spanning almost 148 genes. The task of identifying a single mutation among those genes was daunting.
“It was staring us in the face for a decade,” says Walsh, an HHMI investigator at Children’s Hospital Boston, “but it was in such a packed area of the genome that no one wanted to go after it.”
Scientists had no way to quickly sequence many genes at once. They could painstakingly sequence the genes one by one, or they could sequence an entire human genome—far more expensive and just as time-consuming.
Finally, in 2009, a new automated method opened the floodgates. Called exome sequencing, it allows researchers to quickly piece together the sequence of the exome, the 1 percent of the genome that encodes proteins. Focusing on this small portion, where many disease-related genes had already been found, made sense.
“The goal of my lab used to be to identify one disease gene per year; now, we’re identifying one or two per week. It’s like a dam opening up.”
Joseph G. Gleeson
Researchers admit, however, that exome sequencing ignores mutations in the other 99 percent of the genome—the regulatory sequences that influence whether a protein is made or how much is produced plus the stretches of nucleotides with unknown functions. And there’s no shortcut for interpreting the data that come from exome sequencing. So, when the cost of whole genome sequencing drops, exome sequencing will likely become obsolete. But, for now, it’s giving scientists a head start on studying the human genome.
In October 2010, barely a year after the first reports of exome sequencing being used to locate disease genes, Walsh published the gene mutations responsible for one form of microcephaly. He used exome sequencing to burrow into the 148 genes on chromosome 19 and found that mutations in WDR62, a gene expressed in developing neurons, are involved. Within months, before and after Walsh’s discovery, two other labs used exome sequencing to do the same thing—and replicated Walsh’s results.
“It was a mountain that no one could climb and then as soon as the tools were developed to make it easier, everybody could do it,” says Walsh. Today, for many labs, exome sequencing is the go-to method to pin down genetic mutations responsible for rare diseases. And researchers who study more common afflictions—like heart disease and autism—are using it to make inroads as well.
For some researchers, exome sequencing is allowing findings that never would have been possible without the method. For others, it’s speeding the pace of discovery.
“The goal of my lab used to be to identify one disease gene per year,” says HHMI investigator Joseph G. Gleeson, who studies the genetics of pediatric brain disorders at the University of California, San Diego. “Now, we’re identifying one or two per week. It’s like a dam opening up.”
Page 2 of 3
Before 2009, Gleeson, Walsh, and others who wanted to find the gene mutations responsible for an inherited disorder had to build extensive pedigrees of families with the disease. The more family members they could find, the better the odds of uncovering the relevant mutations. Then, they used genetic linkage studies—a classic technique based on observations made in the late 1800s—to narrow down the location of the mutation.
When egg and sperm cells form, genetic material is shuffled between matching chromosomes to form unique combinations. The idea behind genetic linkage is that genes closest to each other are likely to stick together and be inherited as a bundle after this shuffle. So by finding known genes shared by family members with a disorder—and lacking in those without the disorder—scientists can deduce that the disease-causing mutation is nearby. But linkage studies are tedious—researchers must test dozens of family members for genetic markers. Even once they crunch the numbers, they are often left with a large swath of chromosome that may or may not contain the mutation they’re looking for.
Each exome segment within this area must then be individually isolated and sequenced using a series of reactions. “In a typical project, there might be 200 genes in your candidate sequence and you were faced with running thousands of reactions to test for potential mutations,” Gleeson recalls.
A handful of labs expanded this technique to isolate and manually sequence all the exons in a genome, a massive undertaking. Their success in using this method to identify genes, however, suggested that if it were made quicker and cheaper, it could be useful on a broad scale. During a six-month period in 2009, several research teams came up with an idea that made the technology more feasible, and labs across the country picked it up.
HHMI investigator Richard P. Lifton at Yale School of Medicine was among the first to realize there was a quicker way to sequence exomes. He proposed that by using a microarray to capture exomes from the genome, sequencing of the exome could be streamlined. At the same time, biologist Jay Shendure at the University of Washington, Seattle, was pursuing a similar idea.
“This was a natural next step to what else had been going on in the field of next-generation sequencing,” says Shendure. In 2008 and 2009, he adds, it cost close to $250,000 to sequence a full genome, depending on the methods used. By comparison, the first exomes were sequenced for about $10,000, plus the initial cost of the sequencing equipment.
“Close to 3,000 disease genes had been mapped at that point and the obvious fact to us was that very few of these had fallen outside the exome,” says Lifton. “So at a time when the cost of sequencing was still relatively high, it occurred to us that we could get a huge advantage if we could fish out the exomes and just sequence them.”
Lifton worked with postdoc Murim Choi and NimbleGen, a private company, to develop an exome-sequencing platform. DNA that’s been cut up into manageable sizes is screened using a microarray made with probes specific for markers throughout the exome. Then the captured DNA bits, which ideally make up the whole exome, can be sequenced.
As a proof of concept that the method could be used to discover disease-related genes, Lifton’s lab used exome sequencing to take a close look at the DNA of a five-month-old Turkish boy diagnosed with Bartter syndrome, a rare disease characterized by low levels of potassium in the blood. Exome sequencing changed the child’s diagnosis, showing that he had a mutation in a chloride channel protein involved in a different disease: congenital chloride diarrhea. Lifton’s team got the result from the DNA of a single affected patient, with no need for dozens of affected individuals. Within a month, Shendure’s team published its own proof of concept.
Page 3 of 3
The power of exome sequencing was immediately clear to geneticists who had spent years toiling on linkage studies. The family pedigrees they’d built for particular diseases could be tackled in mere weeks rather than languishing on seemingly endless waiting lists.
Most recently, Lifton, in January 2012, identified two genes responsible for an inherited form of hypertension in 41 families. The genes encode components of a ubiquitin ligase complex never before linked to blood pressure; that work has advanced understanding of normal blood pressure control. Both Gleeson and HHMI investigator Christine E. Seidman at Brigham and Women’s Hospital have used exome sequencing to diagnose hard-to-pin-down diseases (see Web Extra sidebar, “Exomes in the Clinic”).
Researchers are using exome sequencing—zeroing in on the genes that encode proteins—to explore the biology of certain diseases.
Exome sequencing is proving useful for studying tumors as well. Researchers sequence the exomes from a cancer patient’s cheek swab or blood sample in addition to the patient’s tumor tissue. They can compare the sequences to see how tumor cells have accumulated genetic mutations distinct from the patient’s healthy cells.
Analyses of all the protein-encoding genes in a tumor have revealed mutations in genes that never would have been implicated in cancer, says HHMI investigator Bert Vogelstein of the Johns Hopkins University School of Medicine. In 2009, Vogelstein and his colleagues used exome sequencing to discover a mutation in a gene called IDH1 in brain tumors.
“This was a gene that was thought to be involved in basic metabolism and no one would have thought to check whether it was mutated in cancer,” says Vogelstein. Since then, scientists have found the same mutation in other cancers including leukemia. The discovery has led to a new area of research, he says, to understand how cancer cells alter their metabolism to survive.
Today, more than 25 cancer types have been subjected to exome sequencing, in many cases revealing surprises (see Bulletin May 2011, A Crowd in the Kitchen)—or at least newfound genes. In July 2011, Vogelstein and HHMI investigator Todd R. Golub of the Dana-Farber Cancer Institute separately published data online in Science on the exomes of head and neck cancers. They revealed a handful of mutations that could help drive the development of new therapeutics for the cancers.
Focus on the Drivers
Hearing the way researchers praise exome sequencing for expediting their work, you’d think it was the solve-all technique. But it has its limits. After all, it provides only what it advertises: the sequences of exomes. For scientists, interpreting those sequences still requires old-fashioned elbow grease.
“In cancer sequences it’s often difficult to distinguish the wheat from the chaff,” says Vogelstein. The wheat, in cancer genetics, includes those mutations that drive cells to become cancerous or encourage a tumor’s growth. The chaff is the mutations that just happen to also be present—called passenger mutations.
And it’s not just a problem in cancer genomics. Every researcher who uses exome sequencing is faced with a pile of data to sort through. Sequencing is the easy part; you prepare a sample of DNA and feed it into a lab machine. “The interpretation is the hard part,” says Walsh. “If you have a big family to study, it will be easier. But interpreting the hard cases is still hard.”
Then there’s that other 99 percent of the genome. If researchers can’t find a disease-causing gene in the exome, is it because that gene is in the regulatory part of the genome, or because they just haven’t pinpointed the right mutation in the exome?
“It’s one of the really pressing questions,” says Golub. “How much are we missing? We’re beginning to see cancer types that appear to have particularly low mutation rates, based on the exome. It could be that you don’t need many mutations [to cause the cancer]. But it also could be that you need mutations in the other 99 percent of the genome.”
Eventually, researchers will use whole genome sequencing the way they use exome sequencing today. It’s a matter of waiting for the cost to drop, they all say. Today, sequencing a whole genome costs five to 10 times more than sequencing an exome. And the costs of storing and processing whole genome data are as much as a hundred times higher—and dropping more slowly— than whole genome sequencing costs, which are now below $5,000 per genome, and quickly approaching $1,000 per genome. But the initial costs of sequencers, which can be used for either exome or whole genome sequencing, are also part of the equation.
“Exome sequencing, while it’s amazing, is really just a bridge until the price drops further and we can do whole genome sequencing,” says Gleeson.
“If you were told you could sequence 1 percent of the genome and you asked yourself what’s the most important 1 percent of the genome to sequence, you’d say it’s probably the 1 percent that makes proteins,” says Golub. “Which isn’t to say that nothing else is important. But it’s a good place to start.”