PAGE 3 OF 5
Left: Elena Rivas, Right: Lou Scheffer
While that may not sound dramatic, Eddy's protein-matching algorithms are an industry standard, used by researchers as the search tool for a reference library called the Protein Families database, or Pfam. There are roughly 10 million proteins in the database. Luckily about 80 percent of those sequences fall into a much smaller set of families and Eddy has designed the analysis software to query for matches in this data set. “When a new sequence comes in, [Pfam] is like a dictionary—it's always being added to,” he says. The database currently identifies 12,000 protein families.
There is also an RNA database called Rfam for which Eddy and his Janelia team have software design and upkeep responsibilities. Eddy has to keep one step ahead of his users, which means stressing his analysis tools to the failure point so he can improve them.
“We set up experiments and try to break the software and push the envelope,” he says.
The Janelia computing system is referred to as a “cluster” of processors by both its overseers and its users. The cluster serves 350 researchers and support staff and can scale up to serve many more if requirements demand it. Its design puts a premium on expandability, flexibility, and fast response, particularly since the scientific needs may change and evolve rapidly.
The computer cluster was recently upgraded as part of a regular four-year technology refresh. Made up of commercially available hardware components built by Intel, Dell, and Arista Networks, Eddy calls the system a “working class supercomputer.” Janelia is the first customer for this particular design—in fact, some of the components have serial number 1 or 2 and are signed by the engineers who built them. The new system is up to 10 times faster than the old one and has six times more memory.
“Janelia's new computing cluster provides a platform that is an order of magnitude more responsive than the previous system and can be grown easily to accommodate changing requirements,” says Vijay Samalam, Janelia's director of information technology and scientific computing.
That expanded capacity is a big help to Janelia fellow Louis Scheffer, an electrical engineer and chip designer by training. He uses the cluster to help researchers map the brain wiring of the common fruit fly Drosophila melanogaster. Essentially, it's a massive three-dimensional image-manipulation challenge. First, slices of brain 1/1,000th the thickness of a human hair are digitally photographed with an electron microscope and stored. In each layer, the computer assigns colors to the neurons so researchers can trace their path. As an example, the medulla of the fly, part of the brain responsible for vision, requires more than 150,000 individual images to create the full mosaic, which is 1,700 layers (slices) deep.
But all these pictures must be knitted together so scientists can follow neural paths and see where they lead. Think Google Earth. As you pan across the globe, data are fed onto the screen so you can “fly” from one location to another, and more images are required as you drill down to examine surface topography. Making the transitions smooth in between images requires fine-tuned alignment. “It's not completely simple—there are a whole bunch of distortions to deal with,” says Scheffer. Some are caused by the electron microscope itself as it dries out the target specimen during imaging.
To align one image to its neighbor takes about one minute of computer time. But once matched, the resulting checkerboard must be stacked and aligned with the mosaic of images above and below. “You need to make about one million comparisons,” Scheffer says. “It would take [a personal] computer four years.” With Janelia's parallel processors on the task, the job is done in a few hours.
But Scheffer's matching is just the first step in the image-manipulation process. Janelia software engineer Philip Winston takes the processed pictures and does the unthinkable—he chops them up again. He creates smaller “tiles” of the photos, which can be more easily added and subtracted from a computer screen as a researcher pans across an image. “To open a single image would take five minutes if you didn't tile them,” says Winston. Only 20 tiles are required on the screen at any one time. Currently, Winston is working with four million tiles as part of the Janelia Fly Electron Microscope project to map the entire brain of the fruit fly.
Photos: Paul Fetters