To a scientist, traversing a human chromosome from end to end might seem like driving across Texas: the route passes through some interesting towns, but in between there’s a whole lotta nothin’. That analogy is based on the fact that the roughly 20,000 protein-coding human genes comprise less than five percent of the genome. The only routes through these gene “towns” are vast highways of seemingly featureless, “non-coding” DNA sequence.
Scientists are now learning that what they once thought of as barren regions of the genome are actually rich with information. Those stretches of non-coding DNA contain instructions for when, where, and to what degree each of the genes should be turned on. But the precise location of those instructions has been something of a mystery. This has made it difficult to interpret genetic analyses that link the probability of developing a specific disease to variations in the genome’s non-coding regions. These kinds of studies, called genome-wide association studies, have become increasingly widespread thanks to the completion of the human genome project and recent reductions in genotyping costs.
“By connecting non-coding DNA variants to genes and regulators, the model can guide the interpretation of association studies and perhaps lead to a better understanding of the disease at a molecular level.”
Bradley E. Bernstein
New maps of the genome produced by Howard Hughes Medical Institute early career scientist Bradley Bernstein and colleagues at MIT and the Broad Institute detail where certain markers of genetic regulation, known as histone marks, are found on each of the human chromosomes in a variety of cell types. The maps will be a valuable tool for researchers trying to interpret large-scale genetic analyses.
In the March 24, 2011, issue of the journal Nature, Bernstein and co-senior author Manolis Kellis present a collection of maps charting the topography of each of the chromosomes in different types of human cells. They report that the genomic landscape—sculpted primarily by interaction of the DNA helix with proteins called histones—varies radically in different types of cells.
| Chromatin States in Nine Cell Types|
Color-coded maps reveal how chromatin states vary in different cell types.
“People knew that interpreting the human genome would be complex,” says Bernstein, an associate professor of pathology at Massachusetts General Hospital and Harvard Medical School. “But what is surprising is the gradual realization that the 95 percent of your DNA that is non-coding is where much of the action is.”
To create the maps, Bernstein’s laboratory isolated from cells all of the genomic DNA that was bound by histones, which can alter DNA structure in various ways. Some types of histones sit on DNA upstream of a gene and promote the gene’s activation. Other histones coil DNA into an impenetrable knot, and still others relax DNA coils to enhance gene expression. Like other modifications that change gene expression without altering the sequence of the gene itself, these histone marks are described as epigenomic features. In all, the group collected DNA interacting with nine different histone marks.
They then sequenced that DNA to position the DNA/histone interactions along the chromosomal highway in samples derived from nine human cell types, including leukemia and liver cancers, embryonic stem cells, vessel-forming cells, and immune, muscle, mammary, skin, and lung cells. Adequate coverage of the 3 billion letters of the genome for all of the samples required sequencing over 100 billion letters, a feat made possible only by next-generation sequencing technology.
Kellis, an associate professor of Computer Science at MIT, and associate member of the Broad Institute of MIT and Harvard, devised computational strategies to interpret the data. By discovering and interpreting meaningful combinations of histone marks, his group painted the genome of each cell type in 15 different colors of chromatin, each color – or ‘chromatin state’ – corresponding to a distinct type of genomic function.
Those color-coded maps showing the genomic coordinates of each chromatin state along each human chromosome are revealed in the team’s Nature paper. The maps paint a rich set of landmarks for navigating the vast non-coding genome, and reveal important regions responsible for cell type-specific gene regulation.
“In traditional biology you learn a ton about a tiny fraction of genes,” says Kellis. “But in modern biology we can study genomic features at the whole genome level. These global views of our genome reveal new principles that are simply not visible at the single gene level.”
For example, the color maps showcase the dynamic behavior of a region of chromosome 1 containing a gene called WLS. In immune cells, WLS is wrapped in repressive chromatin, suggesting that the gene is silenced. However, flipping to the blood vessel cells map reveals active chromatin marks and abundant gene expression for the same gene. And in embryonic stem cells, both repressive and activating signatures pile up near the WLS start site, in a manner that could be either activated or repressed at a moment’s notice.
“Projects like this address how the identical DNA, found in every cell of the body, can give rise to so many cell types,” says Jason Ernst, a postdoctoral researcher in the Kellis lab and the study’s first author. “Skin and heart cells share the same DNA sequence. We need to understand the processes that account for their differences, and our chromatin maps provide at least part of the answer.”
The current mapping project revealed tens of thousands of cell type-specific functional DNA elements in the non-coding portions of the genomes. Many of these appear to act as long-range enhancers of gene expression, but each acts only in certain types of cells. The researchers were then able to connect many of the enhancers to the genes that they target and identify regulatory proteins involved in the process.
“The dynamic nature of these chromatin maps allowed us to go one step beyond genome annotation, and link elements together in regulatory networks,” says Kellis. “We found that regions that functionally interact also change in coordinated ways.”
For example, if an enhancer region is active in the same cell types as a neighboring gene, the researchers linked them together, predicting the enhancer as a control region for the gene. Further associations allowed them to predict which regions were likely to activate the enhancer regions. This led to rich networks of connections. Strikingly, the researchers found that these links agree with genetic links from mapping studies that find sequence variants that are predictive of gene expression levels.
The new decorations of the non-coding genome and the associated links between regulators and their targets may be particularly useful for scientists studying the genetic basis of disease using genome-wide association studies. In these, researchers search the complete human genome looking for DNA variations that are more common in patients with a complex disease, such as diabetes, than in the rest of the population.
With a disease-linked gene in hand, one hope is that the information will stimulate design of new therapies. “But it turns out that most associated DNA variants reside within the non-coding portions of the genome,” says Bernstein, explaining that this sobering fact makes sense: it is likely that complex interactions among many genes and elements determine an individual’s risk of developing a particular complex diseases.
Although it might be concluded that disease variants in non-coding DNA aren’t meaningful or therapeutically useful, Bernstein and Kellis’ maps suggest the opposite. Their maps show many histone marks that activate genes sitting on top of non-coding parts of the genome that genome-wide association studies have linked to disease -- suggesting that variants in the genomic hinterlands alter gene expression.
Kellis points out instances where a genetic change common among patients with a particular disease disrupts a predicted regulator of an enhancer region specific to a type of cell involved in that disease. “This provides immediate mechanistic hypotheses for the disease, and even specific regulatory pathways that can become targets for drug development,” he says. The implications for human health and personalized medicine are tantalizing.”
These patterns make biological sense as they link specific diseases to the relevant cell types. For example, many sequence variants associated with characteristics of blood cells align with activating histone marks in leukemia cells, while DNA anomalies associated with the autoimmune disease lupus lie under enhancer marks in immune cells.
“Our work suggests that disease variants that aren’t changing the structure of a protein are instead regulating the amount of protein produced in a relevant cell type,” says Bernstein. “By connecting non-coding DNA variants to genes and regulators, the model can guide the interpretation of association studies and perhaps lead to a better understanding of the disease at a molecular level.”