HomeResearchIntegrating Chemistry and Evolution to Illuminate and Program Biology

Our Scientists

Integrating Chemistry and Evolution to Illuminate and Program Biology

Research Summary

David Liu's research integrates chemistry and evolution to illuminate and program biology, especially toward the development of new therapeutics. His major research interests include (1) the discovery of therapeutically relevant synthetic molecules using DNA-templated synthesis, a technique developed in his laboratory, and Darwinian selection; (2) the characterization and engineering of genome-editing proteins toward next-generation human therapeutics; and (3) the laboratory evolution and delivery in vivo of proteins that modify information flow in human cells.

The past century of life sciences research has resulted in an emerging understanding of the ways in which DNA, RNA, proteins, and small molecules regulate information flow in living systems. This understanding has made increasingly realistic the possibility of not just reading but precisely manipulating biological information in humans by changing the structure of our genomes, altering gene expression patterns, engineering new conditions under which genes are expressed, and creating new circuits that interface exogenous synthetic molecules with gene expression programs. The purposeful manipulation of information flow in human cells and patients has the potential to (1) reveal and validate causal relationships among genes, gene products, and human disease and (2) lead to breakthrough small-molecule or macromolecular therapeutics that address disease at the most fundamental level of our software, the human genome, as well as at the level of gene products.

The Liu group integrates chemistry and evolution to illuminate and program biology.

Although the vision of manipulating gene sequences, gene regulation, and gene products with molecular precision in mammalian cells—and, eventually, in humans—has enormous potential to benefit society, several daunting challenges must be overcome before this vision can be fully realized. Perhaps the most significant of these challenges is how to create with a practical efficiency and success rate the many protein or nucleic acid machines that are needed to alter genomes, transcriptomes, or proteomes with a sufficiently high degree of selectivity and potency. To realize a vision in which arbitrary genes, transcripts, or proteins can be manipulated in mammalian cells requires fundamentally new approaches to generating, at an unprecedented scale, protein or nucleic acid machines with precision, tailor-made properties.

These approaches will likely exploit recently discovered natural proteins, coupled with state-of-the-art technologies such as phage-assisted continuous evolution (PACE), by rapidly evolving and engineering these proteins toward uses that advance the science of therapeutics. For example, the creation of robust platforms of programmable CRISPR (Cas9)-based or TALE-based genome editing and transcriptional regulation tools that are capable of turning "on," turning "off," or altering the nucleotide sequence of any combination of genes or regulatory sequences in the human genome represents an ambitious but well-defined goal that would have a major impact on illuminating disease biology and potentially treating genetic diseases.

In addition to evolving and engineering macromolecules, discovering and developing small molecules that can modulate the biological activities of targets validated using programmable genome-engineering proteins are essential activities to connect new biological insights to leads for therapeutic development. Some targets may only be addressable using macromolecular therapeutics by virtue of their binding energies and ability to catalyze transformations such as manipulating the covalent structure of genes and proteins. For other targets, however, small molecules will likely remain the most promising class of agents to modulate activities in therapeutically relevant contexts. Therefore, the development and application of new, highly efficient small-molecule discovery technologies such as the selection of DNA-encoded small-molecule libraries against many biological targets of interest in a single experiment will play crucial roles.

The activities needed to realize this vision can be classified into the following three phases:

Phase 1: Develop the tools. New methodologies and technologies to characterize, engineer, and evolve proteins will be developed and applied to transform natural components such as Cas9 or TALE domains into variants with the specificity, context independence, activity level, stability, cellular compatibility, and effector functions necessary to illuminate or address human disease. These effector functions will likely include DNA cleavage, transcriptional activation, transcription repression, epigenetic modification, and recombination to insert, delete, or replace alleles. Although these activities have become a focus of several laboratories, including our own, many of the key developments and insights have either not yet been reported or only very recently been described. Importantly, TALE- and CRISPR-based systems are programmable using a simple code that relates target DNA sequences with TALE or CRISPR protein or RNA sequences. Because this programmability alone—although crucial—is insufficient to ensure that these tools will be robust or accessible enough to support Phase 2 and Phase 3 activities, methods to rapidly characterize and improve these tools must also be developed.

In addition to programmable DNA-binding proteins and protein–RNA complexes, other macromolecules capable of manipulating biological information flow in human cells—including antibodies, proteases, sortases, recombinases, polymerases, and nucleases—are also poised to play key roles in the understanding and treatment of disease. As is the case with TALE and CRISPR systems, a primary determinant of the likely impact of these proteins is our ability to engineer or evolve therapeutically relevant levels of activity, specificity, stability, or cell-state dependence. Therefore, general methods such as high-throughput specificity profiling or PACE that can efficiently characterize and improve diverse classes of proteins may prove especially valuable to Phase 1 efforts.

Phase 2: Discover the programs. Sets of evolved or engineered macromolecules generated in Phase 1 will be used to discover and test causal relationships between genes and disease-associated pathways in mammalian cells. As Phase 1 methods become increasingly effective, and ever larger sets of these tools become accessible, Phase 2 activities will transition from a hypothesis-testing mode (Does gene A when upregulated and gene B when downregulated induce disease if gene C is mutated?) into a hypothesis-generating mode ("forward genetics") with the goal of discovering sets of genes or proteins that when activated, repressed, or modified alter the propensity of human cells to enter a diseased state.

Phase 3: Enable therapeutics. The knowledge from Phase 1 and Phase 2 will trigger new drug discovery efforts through the identification of new targets for small-molecule screening and development. In addition, the programming tools themselves, if sufficiently specific and active, have potential as macromolecular therapeutics. Phase 3 efforts therefore will aim to develop both small-molecule and macromolecular therapeutics that program human cells in the ways discovered in Phase 2. The macromolecular side of this phase will require characterizing and improving delivery (using new delivery technologies such as supercharged proteins), biodistribution, immunogenicity, and efficacy studies in cell culture and animal models of human disease.

Implementing this ambitious vision in a way that has an effect on society outside of the laboratory will foster—and require—a multidisciplinary, highly collaborative culture that seamlessly integrates chemists, synthetic biologists, macromolecule engineering and evolution experts, cell biologists, clinicians, bioinformaticists, industry experts, and entrepreneurs.

Specific examples of transformative applications include

  • Revealing the genetic dependencies of oncogenesis, infectious disease progression, and metabolic disorders in therapeutically relevant settings;
  • Programming the expression of sets of transcription factors that induce the differentiation or transdifferentiation of therapeutic cells (e.g., pancreatic exocrine cells into beta cells in diabetics, white adipose tissue into brown fat in patients with metabolic disorders, or serotonergic neurons into dopaminergic neurons in patients with Parkinson's disease);
  • Altering the structure of the genes in infected individuals to disrupt the life cycle of infectious disease agents (as a validated example, editing CCR5 in patients with HIV);
  • Programming cells containing cancer- or infectious disease-associated genetic changes to undergo apoptosis; and
  • Implicating genes and gene combinations that grant resistance or sensitivity to known bioactive molecules for which there is no target known.

Other support includes grants from the Defense Advanced Research Projects Agency and the National Institutes of Health.

As of October 17, 2014

Scientist Profile

Harvard University
Chemical Biology