Cellular Bioengineering: Systems and Synthetic Biology
Summary: Adam Arkin is interested in the evolutionary design principles of cellular regulatory networks and how these principles aid in the prediction, control, and design of cellular behaviors. His lab develops physical theory and computational tools for understanding cellular processes such as gene expression, signal transduction cascades, and cytomechanics. The lab also analyzes genomic data relevant to the dynamics of regulatory networks in a number of viral, bacterial, and eukaryotic systems, and performs experiments to test the theories.
The goal of my laboratory is to develop a coherent computational and experimental framework for systems and synthetic biology—fields that aim to understand the principles of coordinated function of cellular networks and their design. The success of large-scale sequencing projects has given us an unprecedented look at the broadly conserved and species-specialized parts of the cellular program; genomes provide a view into the basis of cellular behavior. If the genome does encode the full description of an organism and its capabilities, however, it is only by a series of indirect implications. The genome is a program executed by complex dynamic interactions among the cell's molecular constituents. It is precisely the dynamics and evolution of these interactions that we study.
That evolution is efficient at reusing "useful" cellular subsystems and converging on network designs for specific functions is implied through patterns of conservation ranging from protein domains to bacterial operons to large groups of interacting genes, to recurrent topologies of interactions among biomolecules. We use comparative genomics to uncover conserved groups of genes that may function as modules. Functional genomics and more focused experiments allow us to confirm the evolutionary modules and deduce functional ones. Then, using physical-chemical theory, we make predictions about how these modules function dynamically in the chemically nonclassical cellular environment. From these predictions, we compile integrated models, at different levels of abstraction, of the dynamics and control of cellular pathways and behaviors. The addition of evolutionary game theory enables us to predict the conditions under which certain network behaviors are selected. Finally, to test and apply our understanding of cellular network design principles, we design and implement synthetic biological circuitry to perform specified functions.
Physical Theory and Computation in Cellular Systems
Physically, the cellular environment violates many of the assumptions that are made while deriving the classical laws of chemical kinetics and thermodynamics. The system is decidedly not at equilibrium (or even steady state). It is a molecularly crowded environment. Many of the processes, such as gene expression, are governed by small numbers of molecules or slow reactions so that the discrete and stochastic nature of the reactions must be accounted for. Worse still, the internal networks of biochemical interactions are extraordinarily complex in size and topology, distributed in space, governed by nonlinear processes, and rife with complex feedback structures.
Faced with this complexity, we define recurrent network motifs—topologies of reaction networks found often in biological systems—and then analyze these smaller "modules" for dynamic function. We believe that certain topologies are evolutionarily converged upon because they are particularly efficient at implementing desired dynamics (such as a switch or a pulse) and they are easy to create and plastically evolve, starting with existing biological parts. These motifs are identified through the detailed study of particular cellular systems, such as the stress response in Bacillus subtilis, adhesion in Escherichia coli, viral dynamics in λ-phage and HIV-1, and signal transduction in immune cells. Even simple motifs are capable of sophisticated information processing: a simple second-order reaction can behave analogously to a band-pass filter (a device that allows signals with a defined frequency range to pass while attenuating others) for upstream dynamics, and an enzymatic futile cycle can show true noise-induced bistability when there is only a little noise on the forward enzymatic reaction.
Other motifs show extraordinary functional versatility: a common bicistronic operon antagonist/agonist module, found often in B. subtilis stress response and other bacterial pathways, can be a graded or bistable switch, a pulse generator, or an oscillator, depending on a few key kinetic parameters. Other of the kinetic parameters tune the quantitative behavior of the module within one of these possible functions. Perhaps the common occurrence of this motif arises because it is relatively simple for evolution to tune both qualitative and quantitative behavior of this circuit to adapt cells to new surrounds. That is, it may be selected for evolvability. (The Defense Advanced Research Projects Agency and the Department of Energy provided partial funding for this research.)
Functional genomics measures, in a nearly global way, certain features of the genes, proteins, and metabolites of an organism. These features may be molecular abundances such as mRNA levels measured with gene expression microarrays, molecular interactions such as determined by interaction trap methods, or induced growth sensitivities such as those obtained from yeast haploinsufficiency profiling (HIP). Most of these technologies produce noisy data with relatively large numbers of false negatives and false positives. Robust statistical methods must be built to identify the significant features in a given experiment and classify them across experiments. In many of our projects we are now generating large amounts of microarray, proteomic, and metabolomic data and have to confront the analysis and management of these data types. The most mature analyses we have developed (in collaboration with Michael Jordan, University of California, Berkeley) are for microarray analysis of the HIP experiments performed by Guri Giaever and Ronald Davis (Stanford University). We have developed experimental designs and robust statistical graph models for estimating the growth defects of deletion strains of yeast exposed to a stressor (such as a drug treatment) and for simultaneously classifying experimental conditions and gene functions over a corpus of such experiments. (The National Institutes of Health provided partial support for this research.)
As part of the Virtual Institute for Microbial Stress and Survival, we have developed a series of microarray analysis approaches that exploit the operon and regulon structure of prokaryotic genomes to provide robust measures of significance to observed gene expression changes. These tools underlie a comparative microarray database linked to our MicrobesOnline Web site that we have used to explore stress response in metal-reducing microbes and B. subtilis. (The Department of Energy provided partial support for this research.)
Bacterial Comparative Genomics Tools
As part of a large project on stress response in metal-reducing microbes, we have developed a microbial comparative genomics database and analysis pipeline. The MicrobesOnline Web site currently stores sequence and annotation information for more than 200 fully sequenced bacterial and archaeal genomes. The Web site integrates a number of features: a new comparative genome browser; a novel GO (Gene Ontology) ontology browser; a comparative pathway browser; annotation of operons and regulons, based on a newly developed unsupervised prediction algorithm; a "shopping cart" feature that allows formation of custom groups of genes for further analysis; and a community sequence annotation framework. The tools, recently employed by a consortium of eight leading laboratories to investigate and annotate the genome of Desulfovibrio desulfuricans G20 (likely to be renamed D. alaskensis G20), and by others, are beginning to play a role as a central repository for microbial genome annotation whereby researchers can make their results immediately available to the community at large. This database is also linked to functional genomic information. In the next year we will link to environmental sequencing. This framework is a foundation for comparative network analyses and for understanding the evolution of genomes. The tools for microbial genome analysis that underlie this site have driven new theories on the evolution of operons and the origin of strand bias in genomes. (The Department of Energy provided partial support for this research.)
Software for Modeling of Cellular Systems
One of our central tasks is the development of the BioSPICE software tool, which integrates data, data analysis, and modeling in a single user-friendly interface. This tool is being developed by a large collaborative community with an open architecture so that new data types, bioinformatic analyses, and modeling tools can be easily added by the academic and industrial communities. We adhere to this open-source, open-software philosophy in the hopes that the biological/biophysical community will adopt this platform as a standard. As part of this project we have developed a number of modules for pathway reconstruction by homology, network motif finding, and pathway knowledge capture. We are, in collaboration, developing modules for spatial simulation and principled model/data comparison. The pathway tools are now being used in VIMSS and with the Alliance for Cellular Signaling to analyze both bacterial and eukaryotic regulatory networks. (The Department of Energy and the Defense Advanced Research Projects Agency provided partial support for this research.)
Analysis of Cellular Pathways
Armed with the fundamental understanding of how small groups of molecules may operate in the cellular environment and informed by functional genomic and other experiments, we wish to understand how motifs are wired together to produce particular cellular behaviors, to understand why certain regulatory choices are selected by evolution, and to determine the principles of control in these networks. We analyze cellular pathways with clear engineering specifications that must be met if the organism is to survive. Consider, for example, bacterial and immune cell chemotaxis. The vastly different physical sizes of the two cell types put different physical constraints on how chemical gradient sensing can be accomplished. Even homologous chemotaxis pathways in bacteria living in distinct environments will be subject to different evolutionary selection on their regulation. We have recently compared models for chemotaxis in E. coli (an enteric bacterium) and B. subtilis (a soil microbe). Though similar phenotypically and implemented by orthologous proteins, the mechanisms implementing the nearly identical control laws for chemotaxis are different in the two organisms. These differences lead to less sensitivity of, for example, the adaptation time to changes in the kinetic parameters for the network in B. subtilis than for the network in E. coli.
Whether this extra "robustness" property is important for survival remains to be seen, but the comparative analysis of cellular pathways opens questions about which aspects of regulation in these networks are under strong selection and which are historical artifacts. We have begun to use comparative genomics to quantify the selective pressures on different aspects of network function and to use a variant of evolutionary game theory to understand what sorts of environmental pressures select for different network behaviors. Recently we have developed a sophisticated evolutionary game theory for under which environmental conditions, and with what sensor architectures, clonal bacterial populations would employ different diversification strategies, such as phase variation.
With the development of the above theoretical, computational, and experimental framework, we are undertaking integrated applications projects. For example, in work started by a former HHMI graduate student, Leor Weinberger, in collaboration with David Schaffer (University of California, Berkeley), we have been computationally and experimentally exploring gene expression and viral gene therapeutic control of HIV-1 and its propagation. The combination of quantitative measurement, mechanistic modeling, and consideration of the population dynamics of the organism and its evolution will be a hallmark of our future research. (The Defense Advanced Research Projects Agency, Department of Energy, Office of Naval Research, and the National Institutes of Health provided partial funding for this research.)
The Virtual Institute for Microbial Stress and Survival
The focus of VIMSS is to rapidly deduce the stress response and metabolic pathways responsible for cellular survival in the environment of metal-reducing microbes. To accomplish this, Terry Hazen (Lawrence Berkeley National Laboratory) and I codirect a consortium of researchers at three universities, three national laboratories, and an industrial site to uncover the stress response pathways in three soil bacteria and compare their differential regulatory strategies. Our goals are to understand how they survive in the natural environment, interact with their communities, and are naturally stimulated to reduce metals and radionuclides for bioremediation purposes. Sophisticated biomass production, cellular imaging, functional genomics, and computational biology facilities have been developed and are now in operation to accomplish these aims. (The Department of Energy has supported this research.)
Last updated May 15, 2007