重新定义蛋白质二级结构预测的目标。

Redefining the goals of protein secondary structure prediction.

作者信息

Rost B, Sander C, Schneider R

机构信息

EMBL Heidelberg, Germany.

出版信息

J Mol Biol. 1994 Jan 7;235(1):13-26. doi: 10.1016/s0022-2836(05)80007-5.

DOI:10.1016/s0022-2836(05)80007-5

PMID:8289237

Abstract

Secondary structure prediction recently has surpassed the 70% level of average accuracy, evaluated on the single residue states helix, strand and loop (Q3). But the ultimate goal is reliable prediction of tertiary (three-dimensional, 3D) structure, not 100% single residue accuracy for secondary structure. A comparison of pairs of structurally homologous proteins with divergent sequences reveals that considerable variation in the position and length of secondary structure segments can be accommodated within the same 3D fold. It is therefore sufficient to predict the approximate location of helix, strand, turn and loop segments, provided they are compatible with the formation of 3D structure. Accordingly, we define here a measure of segment overlap (Sov) that is somewhat insensitive to small variations in secondary structure assignments. The new segment overlap measure ranges from an ignorance level of 37% (random protein pairs) via a current level of 72% for a prediction method based on sequence profile input to neural networks (PHD) to an average 90% level for homologous protein pairs. We conclude that the highest scores one can reasonably expect for secondary structure prediction are a single residue accuracy of Q3 > 85% and a fractional segment overlap of Sov > 90%.

摘要

二级结构预测最近在单残基状态的螺旋、链和环（Q3）评估中平均准确率已超过70%。但最终目标是可靠预测三级（三维，3D）结构，而非二级结构的100%单残基准确率。对具有不同序列的结构同源蛋白质对进行比较发现，在相同的3D折叠中可以容纳二级结构片段的位置和长度的相当大变化。因此，只要螺旋、链、转角和环片段的预测位置与3D结构的形成兼容，预测其大致位置就足够了。相应地，我们在此定义了一种片段重叠度（Sov）度量，它对二级结构分配中的小变化不太敏感。新的片段重叠度度量范围从37%的无知水平（随机蛋白质对），到基于序列轮廓输入到神经网络（PHD）的预测方法目前72%的水平，再到同源蛋白质对平均90%的水平。我们得出结论，二级结构预测可以合理期望的最高分数是Q3的单残基准确率>85%以及Sov的片段重叠分数>90%。