Mirny Leonid A, Gelfand Mikhail S
Harvard-MIT Division of Health Science and Technology, Massachusetts Institute of Technology, 77 Massachusetts ave, Cambridge, MA 02139.
Genome Biol. 2002;3(3):PREPRINT0002. doi: 10.1186/gb-2002-3-3-preprint0002. Epub 2002 Feb 19.
Concepts of orthology and paralogy are become increasingly important as whole-genome comparison allows their identification in complete genomes. Functional specificity of proteins is assumed to be conserved among orthologs and is different among paralogs. We used this assumption to identify residues which determine specificity of protein-DNA and protein-ligand recognition. Finding such residues is crucial for understanding mechanisms of molecular recognition and for rational protein and drug design.
Assuming conservation of specificity among orthologs and different specificity of paralogs, we identify residues which correlate with this grouping by specificity. The method is taking advantage of complete genomes to find multiple orthologs and paralogs. The central part of this method is a procedure to compute statistical significance of the predictions. The procedure is based on a simple statistical model of protein evolution. When applied to a large family of bacterial transcription factors, our method identified 12 residues that are presumed to determine the protein-DNA and protein-ligand recognition specificity. Structural analysis of the proteins and available experimental results strongly support our predictions. Our results suggest new experiments aimed at rational re-design of specificity in bacterial transcription factors by a minimal number of mutations.
While sets of orthologous and paralogous proteins can be easily derived from complete genomic sequences, our method can identify putative specificity determinants in such proteins.
随着全基因组比较使得在完整基因组中识别直系同源基因和平行同源基因成为可能,直系同源和平行同源的概念变得越来越重要。蛋白质的功能特异性被认为在直系同源基因中是保守的,而在平行同源基因中是不同的。我们利用这一假设来识别决定蛋白质 - DNA和蛋白质 - 配体识别特异性的残基。找到这些残基对于理解分子识别机制以及合理的蛋白质和药物设计至关重要。
假设直系同源基因间特异性保守而平行同源基因特异性不同,我们识别出了与这种按特异性分组相关的残基。该方法利用完整基因组来寻找多个直系同源基因和平行同源基因。此方法的核心部分是一个计算预测统计显著性的程序。该程序基于蛋白质进化的简单统计模型。当应用于一大类细菌转录因子时,我们的方法识别出了12个据推测决定蛋白质 - DNA和蛋白质 - 配体识别特异性的残基。蛋白质的结构分析和现有的实验结果有力地支持了我们的预测。我们的结果提示了旨在通过最少数量的突变对细菌转录因子的特异性进行合理重新设计的新实验。
虽然直系同源和平行同源蛋白质组可以很容易地从完整基因组序列中推导出来,但我们的方法可以识别此类蛋白质中假定的特异性决定因素。