Skip to main content
Loren Frank’s HHMI lab at UCSF has pioneered an ambitious framework for sharing vast neuroscience datasets and complicated analysis methods, a step towards tipping the culture of science towards more effective and fruitful collaboration.
Investigator, University of California, San Francisco
Loren Frank’s HHMI lab at UCSF has pioneered an ambitious framework for sharing vast neuroscience datasets and complicated analysis methods, a step towards tipping the culture of science towards more effective and fruitful collaboration.


Science holds the promise of tackling some of the world’s greatest problems, from taming new pandemics and climate change to figuring out why brain circuity goes awry. But science is not without its own thorny problems. In the often-cutthroat race for new discoveries and for prestigious publications and awards, research teams may work in secret for years, collecting reams of data and conducting intricate analyses that are enormously difficult to verify. Without access to the full research environment and data, “it’s almost impossible for one group to replicate the work of others,” says Howard Hughes Medical Institute (HHMI) Investigator Loren Frank at the University of California, San Francisco (UCSF).

Moreover, the fact that science “is a highly competitive, fractured profession is one of the things that slows science down,” says Kristen Ratanexternal link, opens in a new tab, founder of Strategies for Open Science (Stratos), who is working with HHMI on strategies for data sharing. That’s especially problematic as researchers tackle harder and harder questions. Many of today’s complex problems “are no longer solvable by one grad student working in isolation for four or five years,” says Bodo Stern, chief of strategic initiatives at HHMI.

And then too, the process of publishing those results also can take years—and the basic format hasn’t changed much in centuries. “The way we publish science is outdated,” says Stern. “The format of an article today looks remarkably similar to that of a Nature article from 1869.”

That’s why there’s now a growing push to change the very culture of science, swapping some of that fierce competition for friendly collaboration and promoting the widespread sharing of datasets and data analyses well in advance of publication. And while culture changes in science are notoriously difficult, a determined band of open science advocates has been making real progress. Stratos, which Ratan founded in 2019, has been working with HHMI researchers like Frank to explore the idea of “collaboration hubs” that was originally pioneered at NASA to interpret the vast amounts of environmental and space data coming from satellites, telescopes, and many other sensors. 

The US Government and Federal agencies have also signaled their support for open science and collaboration. Indeed, the Biden Administration’s Office of Science and Technology Policy (OSTP) declared 2023 the Year of Open Scienceexternal link, opens in a new tab. In a landmark 2022 memoexternal link, opens in a new tab, OSTP required agencies “to make publications and their supporting data resulting from federally funded research publicly available.” Meanwhile, the National Institutes of Health has put in place an even stricter data sharing requirement. Says Ratan, “the policy landscape in the US has shifted extraordinarily.”

Making Data More Accessible

Policy changes, however, are only part of the story. Scientists also need to step up not just to release their data, but also to devise ways to make those data and their analysis methods more accessible and comprehensible to would-be collaborators. Fortunately, there’s now an ambitious new example of such a data sharing tool. On January 26, 2024, after five years of nitty gritty software engineering work, Frank’s lab released a preprintexternal link, opens in a new tab describing a new “data analysis framework for reproducible and shareable neuroscience research” that his team dubs “Spyglassexternal link, opens in a new tab.” 

In the Spyglass framework, all the data that Frank’s lab collects from arrays of electrodes inserted into rat brain regions involved in behavior, learning, and imagination—along with detailed information on each animal’s second by second behavior—are brought together and stored in a standardized format, called Neurodata Without Bordersexternal link, opens in a new tab (NWB). Then, Spyglass provides software code (written in an open-source language, Python) that allows both sharing and analysis not just of the raw data, but also of the results from every step in what typically is a very complex analysis. As the preprint describes, “Spyglass also offers ready-to-use pipelines for analyzing behavior and electrophysiology data, as well as extensive documentation and tutorials for training new users.”

Spyglass is available to anyone without having to understand NWB or download software through a cloud-based data sharing hub that HHMI commissioned. This is “a real leap in data sharing,” says Ratan, with benefits not just for the neuroscience community but also for Frank’s lab and its direct collaborators. The link to this hub is available through the preprint and offers anyone with enough compute power the ability to conduct their own analyses, changing parameters and assessing results on their own. Using the new standardized approach to data collection and analysis, “we’re doing things two to three times faster than we were before,” Frank explains. Adds Stern, “it’s a huge upfront investment that’s now starting to pay off.”

The Spyglass story begins in the late 1990s, when Loren Frank was a graduate student trying to figure out how animal behavior is related to the patterns of neural activity in the animals’ brains. At the time, every member of the lab developed their own methods of processing the data, which could vary by experiment. Frank thought there must be a better way. “I tried to come up with something I could use across multiple subjects,” he recalls.

The initial code he wrote was “kludgy,” he says, “but not terrible.” Over the next couple of decades, he and a grad student rewrote the software to make it more versatile, making it possible to combine data into chunks that showed the brain regions being probed and how the neurons fired depending on what the animals were doing. “That system was useful for many years,” Frank recalls, but it had serious limitations. It could organize the initial data but couldn’t allow data taken at different times or from different researchers to be added in or matched. Nor did it adequately track the many steps in the analysis from raw data to final results, he says. By 2016, “it was clear it wasn’t enough,” Frank says. 

Around that same time in the late 1990s, the neuroscience field realized it had a major reproducibility problem. Experiments produced huge amounts of data, and analyses typically had so many hidden steps that they could take years to understand. Those challenges make it hard or even impossible for one lab to build on the work of others, leading to too much duplication and holding back progress in the field. “We spend millions of dollars for someone to collect very complex data, but the next researcher still has to produce their own data,” says Stern.

So in 2019, Frank told the members of his lab that they were going to build a new system. Once it was ready, nothing would be published that didn’t use the new NWB format and the new system. Frank’s proclamation didn’t go over well initially. “People were not thrilled,” he recalls. Graduate students are in a race to accumulate enough data and results to get their PhDs, and post-docs need to churn out papers to advance the careers. So why take the big risk of spending precious time to write complicated software? “They were concerned they would be spending all their time debugging and not being able to do science,” Frank says.

Then the COVID-19 pandemic hit. With labs temporarily shut, Frank was stuck at home with time to write software code and plan the basic structure of what would become the Spyglass framework. He also found an eager partner in Kyu Hyun Leeexternal link, opens in a new tab, who joined the Frank lab at the end of 2020. For Lee’s own PhD, his data had been organized in an “idiosyncratic” way, Lee recalls, and each of the many steps of analysis relied on a different software tool. “It was a bit of a mess,” he says.

So Frank’s vision of a clear, unified framework was particularly appealing to Lee. The COVID lockdown also meant that Lee couldn’t get into the lab to start his own experiments probing the hippocampus and visual cortex in rats to show how the animals can imagine themselves to be in a different location. And the data sharing idea seemed both challenging and rewarding. “One of the reasons I spent so much time working on it was that it was really fun,” Lee says.

Lee is the first author of the Spyglass paper, though he hastens to add that he had plenty of help from his colleagues—including his co-first author Eric Denovellis and Frank. “I’d never met a principal investigator before who still codes,” Lee says. In addition, strong support from HHMI enabled Frank to hire two software engineers, and to take valuable time away from research to develop the new tools. “I’m fortunate to be funded by HHMI,” Frank explains. “It gives us the capacity to do harder things.”

A Game-Changer

The new framework is now in use for all researchers in Frank’s lab, and “it changes the game in terms of what we can do,” he says. “We can take these complicated data streams—the numbers that estimate what the animal is thinking—and relate them to the animal’s behavior,” he says. “Then we can run the standard analysis in a day, which might have taken weeks before.” In addition, says Lee, “if we really make the effort to put the data and the code in a shareable state, we become more confident ourselves in the data and the results.”

For the broader research community, the Spyglass framework is available to anyone in a cloud-based data sharing environment. This data hub is “a first step towards opening a new doorway in science,” says Ratan, who is working with the hub-building technology nonprofit, 2i2cexternal link, opens in a new tab, to test how such data sharing could be done more broadly across many different fields. Systems like it essentially create virtual representations of the entire work of a lab, she explains, enabling others to compare or combine their own data and even to mine the data for new findings. 

Ratan and the team at 2i2c envision the creation of “data lakes” or “data bazaars” that pool the work of many researchers, making new scientific leaps more likely. “I believe this kind of collaborative environment is going to be really critical for the progress of science going forward,” says Frank. 

Of course, making data sharing technically possible doesn’t guarantee that it will happen. “Data sharing and culture change in science are hard,” says Frank. Researchers legitimately fear that their own painstakingly collected data could be used by competitors to scoop them. They also may want to keep the data close at hand so that they can wring out the maximum number of papers. Or they may be embarrassed to share their code, if it’s not in great shape or well documented, suggests Lee. “It’s like showing your dirty laundry.”
 
But Stern, Ratan, Frank, and many others are hopeful that a combination of carrots and sticks will successfully swing the culture of science in the direction of a new era of collaboration. The sticks are the new policies that require data sharing. The carrots are the potential rewards from collaboration itself or from creating data sharing tools like Spyglass. Those could include specially targeted grants and scientific recognition, earlier publication, or even new prizes that reward particularly productive collaborations that accelerate the pace of scientific discovery. Ultimately, says Stern, “the hope is that collaboration will become the new competitive strategy.”

And for HHMI, the Spyglass framework offers a model of data collection and sharing that could be adopted by other HHMI teams. Equally important, adds Stern, is the example Frank has set in taking such a big risk to transform how his lab operates. Typically, HHMI’s communications efforts focus on publicizing new and exciting results from HHMI researchers. With the Spyglass preprint, however, Stern sees an important opportunity to also highlight and encourage innovation in how science itself is done. “We want to signal to our scientists that we welcome the exploration of new approaches to data sharing,” he says.