In the hallways at scientific meetings, countless scientists have debated the same question: "How many species need to be sequenced to know whether evolution has conserved a given stretch of DNA?" One computational biologist, Sean Eddy, has done the math.
In the January 2005 issue of PLoS Biology, Eddy, who at the time was an HHMI investigator at Washington University School of Medicine in St. Louis, described a mathematical model that puts hard numbers on the need to keep sequencing genomes from diverse organisms. According to the model, in order to detect invariant single nucleotides conserved in evolution, researchers would need to compare about 17 genomes separated by the average evolutionary distance between humans and mice.
What's more, more than two dozen genomes are needed to do the same job when conserved nucleotides are allowed to change at a more rapid, realistic rate. To reduce the error rate from 1 in 100 to 1 in 10,000, about 120 such genomes should be compared. Notably, far fewer genomes are required to detect conserved features larger than a single nucleotide.
For Eddy, huge numbers—not to mention the letters A, T, C, and G—are all in a day's work. He designs mathematical tools to probe modern genomes, unraveling the structures, functions, and histories of specific genes. In particular, Eddy works to identify genes that make functional RNAs instead of proteins.
To identify noncoding RNA (ncRNA) genes, for example, Eddy's lab has developed statistical models describing the pattern of mutation expected in these genes versus the different mutation patterns commonly found in protein genes. With these models, they can scan genome regions to find areas that likely contain ncRNA genes. Eddy's team has used this approach to predict a few hundred new RNA genes in the genome of the bacterium Escherichia coli, some of which they've confirmed experimentally. The lab also is conducting computational screens for new ncRNA genes in humans, nematodes, and yeast, among other organisms. Because ncRNA genes tend to be small and are inherently immune to frameshift or nonsense mutations, they are hard to find by classical mutational genetic screens. They are also difficult to recognize in genomes because they do not have open reading frames and thus cannot be discovered by gene-finding programs.
As a group leader at Janelia Farm, Eddy will bring his research program—and considerable computational savvy—to the new campus. He anticipates being a high-tech service provider, fulfilling some of the needs of the other group leaders, such as comparing given gene sequences across genomes.
Although Eddy does not yet have a defined research plan for his Janelia Farm lab, he's planning to branch out—or back, as the case may be. As a postdoctoral fellow at the MRC Laboratory of Molecular Biology in Cambridge during the 1990s, Eddy originally planned to study the neural circuits behind basic behaviors such as learning and memory, but his research project got scooped by a colleague with similar interests.
"My dream is to switch back to what I set out to do in the first place," Eddy says. "At Janelia Farm, I can integrate my work with neurobiology, and I can take my time in developing a research program." He's also drawn to Janelia Farm for the promise of a culture of collaboration and focused science. "The Farm is unique because it will have a whole research enterprise designed, from the ground up, to invent new technologies alongside science."
RESEARCH ABSTRACT SUMMARY:
Sean Eddy develops methods and software tools for large-scale analysis of genes and genomes.
View Research Abstract
Photo: Paul Fetters