
PAGE 2 OF 5
The secret of her success was an archive of Sydney Brenner's work, a gold mine of many of the Nobel laureate's laboratory notebooks and thousands of his electron-microscopy (EM) images, mostly unpublished, on C. elegans anatomy. Chen spent several months poring over a dozen laboratory notebooks and more than 10,000 electron micrographs at the Albert Einstein College of Medicine's worm image archive, painstakingly reconstructing the neural connections that will inform her own research in the lab of Dmitri Chklovskii, an assistant professor at Cold Spring Harbor Laboratory and incoming group leader at HHMI's Janelia Farm Research Campus.
Chen prevailed because the Brenner archive was safely in the hands of David H. Hall, director of the Center for C. elegans Anatomy at Einstein, after it had sat moldering in boxes at the Medical Research Council (MRC) Laboratory of Molecular Biology in Cambridge, England, for more than 15 years. Hall had long cajoled people at MRC, convincing them to let him become the keeper of all that potentially useful worm image data.
There was a time when EM was a workhorse of biological research. But in the early 1980s, the genome revolution forced a radical change as scientists abandoned EM slides for DNA sequencing gels in a quest to get at the genetic secrets of C. elegans. Lost in this shift of resources was a massive amount of primary data, including maps of the complete neural circuitry of C. elegans collected mainly by Brenner and John G. White, now at the University of Wisconsin-Madison, while at MRC.
“By the mid-1990s, I was the only person left who could make sense of the records,” says Hall, an expert on C. elegans anatomy. “It would all have been lost. I took a personal interest to make sure that didn't happen. My data sets and the MRC data sets were extremely complementary. It just made sense to put them together in one place.”
The research community has lately come full circle, however, because scientists are now eager to connect their molecular data to the detailed anatomical studies that Brenner, White, and their colleagues labored over for years. Researchers are flocking to access Hall's treasure trove of data; he gets 20 to 30 visitors and thousands of Web site hits per week. Hall is in the process of digitizing as many of the approximately 200,000 images as possible, with about 5,000 now available at his Web site (www.wormimage.org).
“We couldn't have done everything,” says Brenner about cataloging the collection. “There was too much data there for the one project. But it testifies to the integrity of the result that it can be used over and over again. And it shows the importance of keeping primary data where others can use it.”
Remarkably, no one besides Hall seems to have foreseen that the worm images would become so valuable. Although most scientists would agree that primary data should be saved, in some cases data can become outdated to the point that no one can interpret them.
“There will always be a need to go back and look at primary data,” says John Spieth, group leader at the Genome Sequencing Center at Washington University in St. Louis School of Medicine. “There is no way we can have the foresight to know what will be important 10 or 50 years from now.”
But saving data goes way beyond collecting a pile of graduate student notebooks and theses on a dusty top shelf. A quiet crisis looms in many labs as the volume of data generated by large-scale science grows at an alarming rate. Individual laboratories are struggling to find efficient and economical ways to store and retrieve key data. Many researchers have coped alone thus far, but some are now looking to large centralized archiving systems to bear part of the burden. And the rapidity of technology development, for example, has prompted Spieth to resequence parts of the C. elegans genome rather than rely on decade-old sequence data produced by technology that is now considered antiquated. Reacquiring data may work for large centralized data centers, but in individual labs, changing technology has often meant keeping antiquated equipment around so that data are not lost.
|