Jaroszewski L, Rychlewski L, Zhang B, Godzik A
Department of Chemistry, University of Warsaw, Warszawa, Poland.
Protein Sci. 1998 Jun;7(6):1431-40. doi: 10.1002/pro.5560070620.
Several fold recognition algorithms are compared to each other in terms of prediction accuracy and significance. It is shown that on standard benchmarks, hybrid methods, which combine scoring based on sequence-sequence and sequence-structure matching, surpass both sequence and threading methods in the number of accurate predictions. However, the sequence similarity contributes most to the prediction accuracy. This strongly argues that most examples of apparently nonhomologous proteins with similar folds are actually related by evolution. While disappointing from the perspective of the fundamental understanding of protein folding, this adds a new significance to fold recognition methods as a possible first step in function prediction. Despite hybrid methods being more accurate at fold prediction than either the sequence or threading methods, each of the methods is correct in some cases where others have failed. This partly reflects a different perspective on sequence/structure relationship embedded in various methods. To combine predictions from different methods, estimates of significance of predictions are made for all methods. With the help of such estimates, it is possible to develop a "jury" method, which has accuracy higher than any of the single methods. Finally, building full three-dimensional models for all top predictions helps to eliminate possible false positives where alignments, which are optimal in the one-dimensional sequences, lead to unsolvable sterical conflicts for the full three-dimensional models.
在预测准确性和显著性方面,对几种折叠识别算法进行了相互比较。结果表明,在标准基准测试中,结合基于序列 - 序列和序列 - 结构匹配评分的混合方法,在准确预测的数量上超过了序列方法和穿线法。然而,序列相似性对预测准确性的贡献最大。这有力地表明,大多数具有相似折叠的明显非同源蛋白质的例子实际上在进化上是相关的。虽然从对蛋白质折叠的基本理解角度来看令人失望,但这为折叠识别方法作为功能预测的可能第一步增添了新的意义。尽管混合方法在折叠预测方面比序列方法或穿线法更准确,但每种方法在其他方法失败的某些情况下都是正确的。这部分反映了各种方法中对序列/结构关系的不同观点。为了结合不同方法的预测,对所有方法的预测显著性进行了估计。借助这些估计,有可能开发一种“评审团”方法,其准确性高于任何单一方法。最后,为所有顶级预测构建完整的三维模型有助于消除可能的假阳性,即在一维序列中最优的比对会导致完整三维模型出现无法解决的空间冲突。