Torrents of new biological data threaten to flood our minds. The human genome contains on the order of 40,000 genes, and about three times as many proteins. The number of interactions among human proteins far exceeds 500,000. The metabolism of a human cell comprises more than a thousand metabolites connected by several thousand reactions. My lab develops computational methods that will enable researchers to build multiscale representations of how cellular processes are organized. The cartographic approach will enable researchers to present information in a hierarchical, and thus scalable, manner. Even as the amount of information increases, the representation will be able to extract the small set of information that is relevant, at the scale of interest. A scalable "cartographic representation" of cellular and molecular processes will also enable researchers to design or reengineer biological systems for therapeutic purposes.
The algorithms that my lab developed have enabled us to rediscover, automatically, features of metabolic networks that took earlier researchers decades to uncover. I applied the cartographic approach to the study of the organization of the metabolism of 12 organisms selected from among the three domains of life. Remarkably, our algorithms classified about 90 percent of the metabolites in these organisms as unimportant; they participate in a small number of distinct reactions that belong primarily to a single pathway. This large fraction of unimportant metabolites indicates a weak signal-to-noise ratio. The important metabolites are a small fraction of all metabolites, and thus difficult to identify. However, once they are found, enormous insight is gained into how metabolism is organized and what components are important.
In contrast to current beliefs about the importance of hubs, I found that "connector" metabolitesmetabolites that participate in reactions involving metabolites from other pathwaysare significantly more conserved across species than "provincial hub" metabolites (which can participate in a much larger number of reactions, but primarily reactions involving metabolites from a single pathway).
Evolution of Cellular Networks
The cartographic representations my lab is developing will enable researchers to quantify the differences in topological position and composition of functional modules fulfilling identical functions in different organisms. I plan to use these quantitative measures of the functional and evolutionary distances between different modules both within and across organisms to extract the principles determining the emergence and evolution of functional modules in different types of cellular networks.
Complex Networks and Graph Theory
The cellular processes studied by biomedical researchers comprise a large number of heterogeneous components. These components are connected through a web of interactions that defines a graph or network. The study of networks and graphs dates back to the 1700s and to Leonhard Euler's work on the Königsberg bridges' puzzle. More recently, special attention has been given to random networks. Random networks form the "maximally disordered" end of a spectrum of possible network topologies. At the spectrum's opposite end, one encounters fully ordered, finite-dimension lattices. Although the analysis and representation of random and ordered networks is straightforward, significant challenges exist when other network types are considered.
The complex networks we find in cellular processes and other real-world systems are neither random nor ordered. The significance of these more general classes of networks was first demonstrated for social systems by Stanley Milgram, but other important work has since been conducted by other social scientists, including Linton Freeman, Mark Granovetter, and Harrison White. Recently, there has been renewed interest in characterizing and modeling complex real-world networks. My lab is developing new frameworks for the study of the complex networks found in biological, technological, and social contexts.
This research was supported in part by grants from the National Institute of General Medical Sciences, the National Science Foundation, and the W.M. Keck Foundation.
As of August 30, 2010