相似和同源蛋白质折叠的识别：序列和结构保守性分析

Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation.

作者信息

Russell R B, Saqi M A, Sayle R A, Bates P A, Sternberg M J

机构信息

Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, Lincoln's Inn Fields, London, UK.

出版信息

J Mol Biol. 1997 Jun 13;269(3):423-39. doi: 10.1006/jmbi.1997.1019.

DOI:10.1006/jmbi.1997.1019

PMID:9199410

Abstract

An analysis was performed on 335 pairs of structurally aligned proteins derived from the structural classification of proteins (SCOP http://scop.mrc-lmb.cam.ac.uk/scop/) database. These similarities were divided into analogues, defined as proteins with similar three-dimensional structures (same SCOP fold classification) but generally with different functions and little evidence of a common ancestor (different SCOP superfamily classification). Homologues were defined as pairs of similar structures likely to be the result of evolutionary divergence (same superfamily) and were divided into remote, medium and close sub-divisions based on the percentage sequence identity. Particular attention was paid to the differences between analogues and remote homologues, since both types of similarities are generally undetectable by sequence comparison and their detection is the aim of fold recognition methods. Distributions of sequence identities and substitution matrices suggest a higher degree of sequence similarity in remote homologues than in analogues. Matrices for remote homologues show similarity to existing mutation matrices, providing some validity for their use in previously described fold recognition methods. In contrast, matrices derived from analogous proteins show little conservation of amino acid properties beyond broad conservation of hydrophobic or polar character. Secondary structure and accessibility were more conserved on average in remote homologues than in analogues, though there was no apparent difference in the root-mean-square deviation between these two types of similarities. Alignments of remote homologues and analogues show a similar number of gaps, openings (one or more sequential gaps) and inserted/deleted secondary structure elements, and both generally contain more gaps/openings/deleted secondary structure elements than medium and close homologues. These results suggest that gap parameters for fold recognition should be more lenient than those used in sequence comparison. Parameters were derived from the analogue and remote homologue datasets for potential used in fold recognition methods. Implications for protein fold recognition and evolution are discussed.

摘要

对从蛋白质结构分类（SCOP，http://scop.mrc-lmb.cam.ac.uk/scop/）数据库中获取的335对结构比对的蛋白质进行了分析。这些相似性被分为类似物，定义为具有相似三维结构（相同的SCOP折叠分类）但通常功能不同且几乎没有共同祖先证据（不同的SCOP超家族分类）的蛋白质。同源物定义为可能是进化分歧结果的相似结构对（相同超家族），并根据序列同一性百分比分为远缘、中等和近缘亚类。特别关注了类似物和远缘同源物之间的差异，因为这两种相似性通常通过序列比较无法检测到，而检测它们是折叠识别方法的目标。序列同一性分布和替换矩阵表明，远缘同源物中的序列相似性程度高于类似物。远缘同源物的矩阵显示出与现有突变矩阵的相似性，为其在先前描述的折叠识别方法中的应用提供了一定的有效性。相比之下，源自类似蛋白质的矩阵除了疏水或极性特征的广泛保守外，几乎没有氨基酸性质的保守性。平均而言，远缘同源物中的二级结构和可及性比类似物更保守，尽管这两种相似性之间的均方根偏差没有明显差异。远缘同源物和类似物的比对显示出相似数量的空位、开口（一个或多个连续空位）以及插入/缺失的二级结构元件，并且两者通常都比中等和近缘同源物包含更多的空位/开口/缺失的二级结构元件。这些结果表明，折叠识别的空位参数应该比序列比较中使用的参数更宽松。从类似物和远缘同源物数据集中导出参数，以供折叠识别方法潜在使用。讨论了对蛋白质折叠识别和进化的影响。