Ouali M, King R D
Department of Computer Science, University of Wales, Ceredigion, United Kingdom.
Protein Sci. 2000 Jun;9(6):1162-76. doi: 10.1110/ps.9.6.1162.
We describe a new classifier for protein secondary structure prediction that is formed by cascading together different types of classifiers using neural networks and linear discrimination. The new classifier achieves an accuracy of 76.7% (assessed by a rigorous full Jack-knife procedure) on a new nonredundant dataset of 496 nonhomologous sequences (obtained from G.J. Barton and J.A. Cuff). This database was especially designed to train and test protein secondary structure prediction methods, and it uses a more stringent definition of homologous sequence than in previous studies. We show that it is possible to design classifiers that can highly discriminate the three classes (H, E, C) with an accuracy of up to 78% for beta-strands, using only a local window and resampling techniques. This indicates that the importance of long-range interactions for the prediction of beta-strands has been probably previously overestimated.
我们描述了一种用于蛋白质二级结构预测的新型分类器,它通过使用神经网络和线性判别将不同类型的分类器级联在一起形成。在一个由496个非同源序列组成的新的非冗余数据集(由G.J. Barton和J.A. Cuff提供)上,该新型分类器通过严格的全留一法程序评估,准确率达到了76.7%。这个数据库是专门设计用于训练和测试蛋白质二级结构预测方法的,并且它对同源序列的定义比以前的研究更为严格。我们表明,仅使用局部窗口和重采样技术,就有可能设计出能够高度区分三个类别(H、E、C)的分类器,对于β链的准确率高达78%。这表明,以前可能高估了长程相互作用对β链预测的重要性。