Qian N, Sejnowski T J
Department of Biophysics, Johns Hopkins University, Baltimore, MD 21218.
J Mol Biol. 1988 Aug 20;202(4):865-84. doi: 10.1016/0022-2836(88)90564-5.
We present a new method for predicting the secondary structure of globular proteins based on non-linear neural network models. Network models learn from existing protein structures how to predict the secondary structure of local sequences of amino acids. The average success rate of our method on a testing set of proteins non-homologous with the corresponding training set was 64.3% on three types of secondary structure (alpha-helix, beta-sheet, and coil), with correlation coefficients of C alpha = 0.41, C beta = 0.31 and Ccoil = 0.41. These quality indices are all higher than those of previous methods. The prediction accuracy for the first 25 residues of the N-terminal sequence was significantly better. We conclude from computational experiments on real and artificial structures that no method based solely on local information in the protein sequence is likely to produce significantly better results for non-homologous proteins. The performance of our method of homologous proteins is much better than for non-homologous proteins, but is not as good as simply assuming that homologous sequences have identical structures.
我们提出了一种基于非线性神经网络模型预测球状蛋白质二级结构的新方法。网络模型从现有的蛋白质结构中学习如何预测局部氨基酸序列的二级结构。在与相应训练集非同源的蛋白质测试集上,我们的方法在三种二级结构类型(α-螺旋、β-折叠和无规卷曲)上的平均成功率为64.3%,α-螺旋的相关系数Cα = 0.41,β-折叠的相关系数Cβ = 0.31,无规卷曲的相关系数Ccoil = 0.41。这些质量指标均高于先前方法。N端序列前25个残基的预测准确率明显更高。通过对真实和人工结构的计算实验,我们得出结论,对于非同源蛋白质,仅基于蛋白质序列局部信息的方法不太可能产生明显更好的结果。我们的方法对同源蛋白质的性能比对非同源蛋白质要好得多,但不如简单地假设同源序列具有相同结构的方法。