Computational Biology, Molecular Biology
University of California, Santa Cruz
Dr. Haussler is also a professor of biomolecular engineering at UC Santa Cruz and scientific director of the UC Santa Cruz Genomics Institute; scientific co-director of the California Institute for Quantitative Biosciences (QB3); and a consulting professor at both Stanford Medical School and UC San Francisco Biopharmaceutical Sciences Department.
The Evolution of Genomes
David Haussler is developing computational and experimental tools for exploring how human and other vertebrate genomes have evolved. By comparing DNA from different species, he can determine how genes’ structure, function, and regulation have changed over evolutionary time. At the same time, he is empowering researchers to access and share genomic data, ensuring that they can quickly test new ideas and build on discoveries from other labs.
As part of the International Human Genome Sequencing Consortium, Haussler’s lab developed the software to assemble the first draft of the human genome sequence. On July 7, 2000, he and his team posted the first working draft of the human genome sequence on the Internet, ensuring the data would be freely available to researchers worldwide. The team also developed the computational methods that are now routinely used to find genes in DNA sequences. Every day, thousands of scientists access information about genomes through the UCSC Genome Browser built by Haussler’s lab.
Using these computational tools, Haussler’s group made the first accurate estimates of what fraction of DNA in the human genome is under the influence of Darwinian selection. They also demonstrated that mobile genetic elements called transposons contributed extensively to the evolution of our genome.
Haussler also applies genome-scale evolutionary and genotype-phenotype association analysis to study cancer and other human diseases. He created the Cancer Genomics Hub, the first comprehensive database of cancer genomes, and has developed algorithms to decode complex rearrangements of DNA in cancer cells. Haussler has used these methods to analyze genomic changes in gliomas and other cancers as part of the NIH’s Cancer Genome Atlas Project.
David Haussler studies the vast regions of the human genome that do not code for proteins. Although these noncoding regions were once called junk DNA, it is now clear that this “junk” has biological relevance and is important for regulating gene activity. Haussler, who is an expert in using computers to understand the enormous amount of data in the genome, wants to learn how these noncoding regions control protein-coding genes.
Haussler studied art and then psychotherapy before settling on mathematics as a college major. During summers, he worked for his brother Mark, a biochemist at the University of Arizona. In his brother’s research lab, Haussler helped carry out experiments on chicks deprived of vitamin D and then analyzed the results. The experience helped him realize he was far more interested in pursuing the mathematics of life than laboratory work.
Haussler was drawn to the mathematical analysis of DNA while pursuing a doctorate at the University of Colorado. At the time, in the early 1980s, the field of bioinformatics was virtually unknown, but Haussler wanted to be a part of it.
Early in his career, Haussler introduced the use of powerful statistical models to find genes that code for proteins in long stretches of DNA sequences. Based on this work, his laboratory was recruited to develop computer algorithms to locate protein-coding genes for the public Human Genome Project. At the time the project was in a race with a private company named Celera to complete the first draft of the human genome. It all came down to assembling hundreds of thousands of fragments of DNA produced by the sequencing machines in laboratories around the world into a coherent genome sequence. At the eleventh hour, a graduate student in the lab, James Kent, created a 10,000-line computer program in just a few weeks that was able to assemble the human DNA for the public project. On July 7, 2000, Haussler and his colleagues posted this assembly, the first working draft of the human genome sequence, on the Internet. “That moment, when the flood of A’s, C’s, T’s, and G’s of the human genome sequence came across my computer screen on its way to the Internet to reach thousands of scientists all over the world, was the most exciting moment of my career,” he recalls.
Haussler and his team then developed the UCSC Genome Browser, an interactive web-based microscope that allows scientists to view annotated genome sequences of humans and other organisms at any level, from a complete chromosome to a single nucleotide. They also created algorithms and software to analyze and display the genetic differences among organisms during the course of evolution. Haussler’s group reconstructed, with an estimated 98 percent accuracy, a significant part of the genome of the common ancestor of most placental mammals – a small shrew-like creature that lived some 100 million years ago. They also have begun assembling a database that can trace the changes in any given nucleotide from the common placental ancestor to humans or other living mammals.
Despite uncertainties about the genome’s full meaning, Haussler is committed to its exploration. “We didn’t understand the human genome sequence the day the draft was posted on the Internet, and we still don’t understand it today,” he says. “What drives me to keep exploring the genome is the same thing that drives most scientists: curiosity and the excitement of the unknown.”