Levin J M
Unité de Bioinformatique, Bat. de Biotechnologie, INRA, Jouy-en-Josas, France.
Protein Eng. 1997 Jul;10(7):771-6. doi: 10.1093/protein/10.7.771.
This paper presents a simple and robust secondary structure prediction scheme (SIMPA96) based on an updated version of the nearest neighbour method. Using a larger database of known structures, the Blosum 62 substitution matrix and a regularization algorithm, the three state prediction accuracy is increased by 4.7 percentage points to 67.7% for a single sequence and up to 72.8% when using multiple alignments. The increase in prediction accuracy with respect to the previous version can be almost entirely ascribed to the sevenfold increase in the size of the database. A more detailed analysis of the results shows that badly predicted regions of a protein sequence are randomly distributed throughout the database and that the goal of perfect secondary structure predictions by methods which use only local sequence information is illusory.
本文提出了一种基于最近邻方法更新版本的简单且稳健的二级结构预测方案(SIMPA96)。通过使用更大的已知结构数据库、Blosum 62替换矩阵和一种正则化算法,单序列的三状态预测准确率提高了4.7个百分点,达到67.7%,使用多序列比对时则高达72.8%。相对于先前版本,预测准确率的提高几乎完全归因于数据库规模扩大了七倍。对结果进行更详细的分析表明,蛋白质序列预测不佳的区域在整个数据库中随机分布,并且仅使用局部序列信息的方法实现完美二级结构预测的目标是不切实际的。