Abagyan R A, Batalov S
Skirball Institute of Biomolecular Medicine, Biochemistry Department, NYU Medical Center, NY 10016, USA.
J Mol Biol. 1997 Oct 17;273(1):355-68. doi: 10.1006/jmbi.1997.1287.
Sequence comparison remains a powerful tool to assess the structural relatedness of two proteins. To develop a sensitive sequence-based procedure for fold recognition, we performed an exhaustive global alignment (with zero end gap penalties) between sequences of protein domains with known three-dimensional folds. The subset of 1.3 million alignments between sequences of structurally unrelated domains was used to derive a set of analytical functions that represent the probability of structural significance for any sequence alignment at a given sequence identity, sequence similarity and alignment score. Analysis of overlap between structurally significant and insignificant alignments shows that sequence identity and sequence similarity measures are poor indicators of structural relatedness in the "twilight zone", while the alignment score allows much better discrimination between alignments of structurally related and unrelated sequences for a wide variety of alignment settings. A fold recognition benchmark was used to compare eight different substitution matrices with eight sets of gap penalties. The best performing matrices were Gonnet and Blosum50 with normalized gap penalties of 2.4/0.15 and 2.0/0.15, respectively, while the positive matrices were the worst performers. The derived functions and parameters can be used for fold recognition via a multilink chain of probability weighted pairwise sequence alignments.
序列比较仍然是评估两种蛋白质结构相关性的有力工具。为了开发一种基于序列的灵敏的折叠识别方法,我们对具有已知三维折叠结构的蛋白质结构域序列进行了详尽的全局比对(末端间隙罚分为零)。在结构不相关的结构域序列之间的130万个比对子集中,用于推导一组分析函数,这些函数表示在给定序列同一性、序列相似性和比对得分的情况下,任何序列比对具有结构显著性的概率。对结构显著和不显著的比对之间的重叠分析表明,在“模糊区”,序列同一性和序列相似性度量是结构相关性的较差指标,而比对得分在各种比对设置下,能更好地区分结构相关和不相关序列的比对。使用一个折叠识别基准来比较八种不同的替换矩阵和八组间隙罚分。表现最佳的矩阵是Gonnet和Blosum50,其标准化间隙罚分分别为2.4/0.15和2.0/0.15,而正向矩阵表现最差。推导得到的函数和参数可通过概率加权成对序列比对的多链路链用于折叠识别。