Over the last several years we have focused on two areas of computational biology research. One is the development of improved methods for comparative modeling of structures for proteins and protein complexes. The second concerns structure, function, interactions, and evolution of proteins that are involved in DNA metabolism; research in this area is carried out using computational methods alone or in conjunction with experimentation.
Comparative modeling uses known structures of proteins as templates to generate three-dimensional models for homologous proteins with unknown structure. Comparative modeling is the most accurate method of protein structure prediction and is often used to explain experimental findings or to guide experimental design. We are particularly interested in distant comparative modeling, such as when the protein of interest and the structural template are only remotely related and therefore defining equivalent positions in the corresponding polypeptide chains is difficult. Despite the challenges involved, it is often possible to produce models of sufficient accuracy that they are valuable structural frameworks for experimental researchers.
Over the years, it has become evident that the single most important factor affecting the accuracy of a homology-based model is the accuracy of the sequence alignment with the structural template. Given that more related structures are becoming available for use as modeling templates, optimizing selection has become an increasingly important problem. During the past few years, we have focused on improving sequence–structure alignments. One of the tools we developed for delineating reliable alignment regions is the intermediate sequence search procedure (PSI-BLAST-ISS), which not only enables detection of reliable alignment regions but also suggests likely variants in unreliable yet structurally conserved regions. In our experience, it is often possible to select the correct alignment in such unreliable regions by assessing how well each of the candidate alignment variants fits within the context of the molecular model. We discovered that relying on the consensus assessment derived from several models of fairly close homologs can further increase our ability to avoid alignment errors.
We demonstrated that our alignment assessment protocol for comparative modeling can also be effective in detecting and correcting sequence–structure mapping errors in protein crystal structures. As a test case, we analyzed all Protein Data Bank crystal structures possessing an OB (oligonucleotide/oligosaccharide binding) fold and found that several had sequence stretches incorrectly mapped onto the structure. Moreover, we used results of computational analysis to direct a revision of the x-ray structure for one of the entries with a fairly inconspicuous error. We suggest that, in general, such computational analysis may facilitate crystal structure determination by either guiding the sequence–structure assignment process or verifying the sequence mapping within poorly defined regions.
DNA replication and maintenance of genome stability are critical functions performed by a multitude of proteins, often functioning as components of large complexes in all three domains of life (Eukarya, Bacteria, and Archaea). Given that the organization of the eukaryotic genome is the most complex, eukaryotic systems that ensure faithful duplication and maintenance of genome stability usually also display greater complexity.
We have devoted much effort to studying the eukaryotic protein complexes termed DNA sliding clamps and clamp loading complexes. These complexes function in almost all pathways of DNA metabolism, including DNA replication, recombination, and repair. We are studying DNA sliding clamps and clamp loaders at several different levels. First, we are interested in the three-dimensional structure of these complexes and their individual components, knowledge of which is important for understanding their molecular mechanism of action and how they interact with their many molecular partners. In addition to molecular mechanisms, we are interested in the evolutionary origin of DNA sliding clamps and their loaders and how strongly these complexes are conserved in eukaryotes. Answers to these questions might indicate the functional importance of a particular pathway defined by one or more of these protein complexes.
In an attempt to reveal the origins and the extent of evolutionary conservation of DNA sliding clamps and clamp loaders, we explored a large number of complete eukaryotic genomes. Not surprisingly, the DNA sliding clamp and its loader (the PCNA and RFC complexes, respectively) that are critical for DNA replication appear to be universally present in eukaryotes. Alternative clamp loaders and the proteins of the 9-1-1 clamp are known to be conserved from yeast to humans. However, their conservation in distant branches of the eukaryotic evolutionary tree has not been studied. Using computational analysis of multiple genomes, we performed a comprehensive analysis of the phylogenetic distribution of genes encoding subunits of both alternative clamp loaders and the 9-1-1 proteins. Our results point to ancient roots of these protein families. However, unlike the PCNA and RFC protein families, both 9-1-1 proteins and alternative clamp loaders are not universally conserved. We identified several cases of apparent gene loss for these protein families, suggesting that, in some organisms, their function may be dispensable. Interestingly, observed gene losses correlate well with corresponding proteins forming either protein complexes or subcomplexes, which suggests that once an integral part of a protein complex is lost other components are also quickly eliminated from the genome.
Whereas evolutionary studies are directed at more general biological questions, computational molecular modeling studies might be useful in understanding specific molecular mechanisms and interactions. An example of such applied protein modeling is our recent joint computational and experimental study of the 9-1-1 clamp interaction with the DNA repair protein MYH, a homolog of bacterial MutY. MYH plays an important role in repairing oxidative DNA damage, which, if unrepaired, can induce mutagenesis and lead to various degenerative diseases. In this collaborative effort, it was determined that human MYH (hMYH) physically interacts with and is stimulated by the 9-1-1 complex, the ring-shaped sliding DNA clamp, but the molecular details of the interaction remained obscure. Using models for both the 9-1-1 complex and hMYH, we were able to propose the putative hMYH interaction region. Moreover, computational analysis enabled us to identify individual hMYH residues largely responsible for stable binding to the 9-1-1 complex. Both computational findings were confirmed experimentally. As a result of the computationally derived models of protein interaction, we were able, for the first time, to identify a specific sequence motif responsible for the binding of the 9-1-1 complex.
Given the difficulties in obtaining structures of large protein complexes experimentally (either by x-ray analysis or by NMR spectroscopy), we believe computational modeling is particularly promising in helping to understand protein interactions in the context of three-dimensional structure.
Last updated August 2008