PAGE 5 OF 5
Jeremy Nathans If you're counting on your published articles serving as a record of your research, he warns, think again.
Jeremy Nathans, an HHMI investigator at the Johns Hopkins University School of Medicine, remembers searching for an article he considers to be a landmark publication. Citation in hand, he figured the fastest way to find it would be a quick PubMed search to link to the original article, which appeared in the journal Nature in 1978. He found nothing. The article had been missed in the process of adding pre-computer-era articles to the PubMed database, which includes citations and abstracts for virtually all published biomedical literature.
Eventually, Nathans tracked down the article by contacting the author, who scanned the original print document and sent a grainy PDF file. But Nathans was still left with an uneasy feeling. Because scientists rely so heavily on PubMed searches, he reasoned, if it doesn't appear there “it's as if it had never existed.” (Nature has since added that particular article to its electronic archive.)
Research results can also disappear when they are relegated to the ranks of “supplemental data” when a journal article is published. These data are only available online, and do not always print out along with the main article. “A lot of us believe that the best way to store data is by publishing it,” says Nathans. “But now journals are telling us to put so much in supplemental data, and that gets divorced from the published article.”
“This issue of supplemental data is becoming bigger and bigger,” says Edwin Sequeira, policy coordinator for PubMed Central, an electronic complement to PubMed that offers free access to full-text journal articles at the National Library of Medicine. “I see it as an economic decision not to put all of the data into print, but I would argue that if the data are important enough to include at all, they are an integral part of an article and should be treated as such.”
Further, says Sequeira, not all journal publishers provide supplemental data when sending their articles for archiving. If a publisher goes out of business, there's no guarantee that those types of materials in its possession will survive. He thinks that as long as scientists are providing such supplemental materials, they should make sure the journals are supplying them to PubMed Central along with the article they complement.
Traditionally, publishers have relied on libraries to maintain long-term archives, but in the digital age that role is in transition. Librarians, publishers, and the scientific community are grappling with how libraries will maintain the role of storing published articles and their supplemental data in the digital age.
One potential solution is now being explored by a consortium organized by Stanford University Libraries. The system, called LOCKSS collects newly published content from participating publishers by using a Web crawler that compares the content it has collected with the same content collected by other LOCKKS users and repairs any discrepancies. The system, initiated by a small team of librarians and engineers, provides a mechanism to guarantee libraries long-term access to complete content by making multiple copies of published data stored at all participating sites. If one site has a technical problem, data can be restored from any of the other sites. Some scientific publishers have begun to buy into the system, which is still in its infancy. To date, 80 major research libraries in the United States and 25 in Europe, as well as others scattered around the world, are participating.
“If publishers go out of business their online resources can vanish,” says Michael Seadle, assistant director for information technology at Michigan State University and a LOCKSS user. “We want to make sure that scholars, 10 or 100 years from now, will still have access to this data. LOCKSS is a way to make sure that published information doesn't disappear, while respecting the publishers' copyright. It's a security policy for everyone.”
Photo: Bill Denison