Home About Press Employ Contact Spyglass Advanced Search
HHMI Logo
HHMI News
HHMI News
Scientists & Research
Scientists & Research
Janelia Farm
Janelia Farm
Grants & Fellowships
Grants & Fellowships
Resources
Resources
HHMI Bulletin
Currrent Issue Subscribe
Back Issues About the Bulletin
May '06
Features
divider

Lymphocytes,
Camera, Actionsmall arrow


divider

There's Gold In
Those Archives


divider

Extreme Shoppingsmall arrow

divider
Online Exclusive
divider

A Lab With a View

divider

The Powerhouse—and Sentinel—of the Cellsmall arrow

divider
Cech
divider
UpFront
divider
Chronicle
divider
Perspectives
divider
Editor

Subscribe Free
Sign up now and receive the HHMI Bulletin by mail free.small arrow

FEATURES: There's Gold In Those Archives

PAGE 3 OF 5

Constance Cepko

Constance Cepko The challenge of storing the vast amount of data generated by her research has her searching for commercial solutions.

“Computer hardware and software quickly become obsolete, so that unless you hold on to your old computers the data you backed up with them may become difficult if not impossible to recover,” says Terrence J. Sejnowski, a computational neuroscientist and HHMI investigator at the Salk Institute for Biological Studies in La Jolla. “It's something we have to live with.”

Archiving Large Databases
Retaining all that material is easier said than done, however.

“It's a problem for everybody,” says HHMI investigator Constance L. Cepko, a neurobiologist at Harvard Medical School who studies the structure and function of the eye in vertebrates. “In trying to link DNA clones, in-situ images, and microarray data, we can generate 30,000 data points in one experiment.” She and her colleagues considered commercial data-management packages and high-tech start-up services for archiving such data, but none filled their needs. At present, an M.D.-Ph.D. student is setting up a customized relational database, but it is just a temporary solution.

Cepko says that because the volume of data her lab generates is rapidly filling servers, she is looking to a centralized archiving system, such as the Mouse Gene Expression Database at the Jackson Laboratory (TJL) in Bar Harbor, Maine, to take some of the data off her hands. TJL aims to make the database, funded by the National Institutes of Health, the leading archive of mouse genomic and proteomic data, and is actively soliciting and adding primary data to its curated, annotated database.

In much the same spirit, Sejnowski has an agreement with the San Diego Supercomputing Center, which maintains and archives all of his lab's large data sets. “You have to find a partner,” he insists. “Data have become so unwieldy that managing them is too much for any one lab to handle on its own.”

HHMI investigator Norbert Perrimon, who studies cell signaling at Harvard Medical School, found the solution to his data-management problems—at least, for the time being—by setting up a centralized public database to store the results of his lab's RNA interference screens in Drosophila. Its infrastructure was funded by a grant from the National Institutes of Health, which allowed him to hire two full-time programmers to get the job done.

But in the long run, the solution will depend on cheaper ways of storing data as well as being more selective, says Perrimon. “The issue that we are facing now is that we do not yet know what is worth keeping in these large-scale studies because the [RNAi] field is not very mature yet. We need to spend more time on data analysis to figure out what has real value in the data sets.” So, for the time being, he is storing it all.

Paul W. Sternberg, an HHMI investigator at the California Institute of Technology, believes the answer may lie in more intelligent searching. “My general feeling is that we know a lot more than we think we do in biology,” he says. “We aren't taking full advantage of what already exists out there. Digital storage is cheap. We should be archiving and making retrievable unpublished primary data.” He is working on systems that will allow scientists to combine primary data from disparate sources, allowing them to develop new hypotheses by combining what he calls “weak hints,” which tend to be overlooked when sources are assessed individually.

In the March 10, 2006, issue of Science, Sternberg and colleagues described how to apply such a computational approach to integrating published data on how genes interact with each other in roundworms, fruit flies, and yeast. “We now know that mining published and available data is valuable,” Sternberg says. “Imagine what we could do if we could access the likely larger amount of unpublished information.”

Sternberg believes this idea also extends to updating that laboratory mainstay, the lab notebook. “The new generation is more comfortable with electronic notebooks,” he says. One of his graduate students keeps a personal blog on the lab's private intranet for recording observations and ideas. “I would have kept that kind of thing in a margin of my [paper] notebook,” says Sternberg. “But then how would I ever find it again? In digital form, you can search and organize thoughts and ideas—and have instant recall.”

Photo: Jason Grow

dividers
PAGE 1 2 3 4 5
small arrow Go Back | Continue small arrow
dividers
Download Story PDF
Requires Adobe Acrobat
Email This Story

HHMI INVESTIGATOR

Terrence J. Sejnowski
Terrence J. Sejnowski
 

HHMI INVESTIGATOR

Constance L. Cepko
Constance L. Cepko
 

HHMI INVESTIGATOR

Norbert Perrimon
Norbert Perrimon
 

HHMI INVESTIGATOR

Paul W. Sternberg
Paul W. Sternberg
 
Related Links

AT HHMI

bullet icon

New Recipe for Discovery: An Online Blend of Worms, Flies, Yeast
(03.10.06)

bullet icon

Making the Right Moves: A Guide to Lab Management

ON THE WEB

external link icon

WormBase

external link icon

WormAtlas

external link icon

Mouse Genome Informatics

external link icon

San Diego Supercomputer Center

external link icon

Drosophila RNAi Screening Center at Harvard Medical School

external link icon

The Explorer's Club

external link icon

PubMed Central

external link icon

LOCKSS

external link icon

Scientific Sources Material

dividers
Back to Topto the top
HHMI Logo

Home | About HHMI | Press Room | Employment | Contact

© 2012 Howard Hughes Medical Institute. A philanthropy serving society through biomedical research and science education.
4000 Jones Bridge Road, Chevy Chase, MD 20815-6789 | (301) 215-8500 | e-mail: webmaster@hhmi.org