illustration by Leif Parsons

Crunching Big Data

Nowadays, microscopes capture images of the brain in unprecedented detail. But with that detail come mountains of complex data that can slow even the fastest computer to a crawl. On a single machine, “you can load the data, start it running, and then come back the next day,” explains Janelia Group Leader Jeremy Freeman. “But if you need to tweak the analysis and run it again, then you have to wait another night.” For larger data sets, the lag time might be weeks or months.

Freeman joined with Janelia Group Leader Misha Ahrens to find another way. The scientists realized that a new distributed computing platform called Spark, which divvies up tasks across a cluster of computers, was particularly well suited to the challenges of neural data. Building on the technology, Freeman and Ahrens developed an open-source library, dubbed “Thunder,” for analyzing large-scale neuroscience data. With their library, tasks that before would take days can be completed in hours or minutes—ideal for supporting high-throughput, exploratory analysis of large data sets.

In a report published July 27, 2014, in Nature Methods, the Janelia team illustrated Thunder’s capabilities by using it to rapidly identify patterns of biological interest in high-resolution images of the brains of mice and zebrafish.

Thunder is designed to run on a private computer cluster or on Amazon’s cloud computing services. It is totally open source; information and tutorials can be found via the GitHub project page at

In this video, each trace represents neural activity across the brain during one presentation of a moving stimulus, and different colors indicate different directions of motion. By watching the movie, we see how neural activity evolves over time. Credit: Jeremy Freeman, Nikita Vladimirov, Takashi Kawashima, Yu Mu, Nicholas Sofroniew, Davis Bennett, Joshua Rosen, Chao-Tsung Yang, Loren Looger, Philipp Keller, Misha Ahrens.

Scientist Profile

Janelia Senior Group Leader
Janelia Research Campus