Chandonia J M, Karplus M
Department of Cellular and Molecular Pharmacology, University of California at San Francisco, USA.
Proteins. 1999 May 15;35(3):293-306.
A primary and a secondary neural network are applied to secondary structure and structural class prediction for a database of 681 non-homologous protein chains. A new method of decoding the outputs of the secondary structure prediction network is used to produce an estimate of the probability of finding each type of secondary structure at every position in the sequence. In addition to providing a reliable estimate of the accuracy of the predictions, this method gives a more accurate Q3 (74.6%) than the cutoff method which is commonly used. Use of these predictions in jury methods improves the Q3 to 74.8%, the best available at present. On a database of 126 proteins commonly used for comparison of prediction methods, the jury predictions are 76.6% accurate. An estimate of the overall Q3 for a given sequence is made by averaging the estimated accuracy of the prediction over all residues in the sequence. As an example, the analysis is applied to the target beta-cryptogein, which was a difficult target for ab initio predictions in the CASP2 study; it shows that the prediction made with the present method (62% of residues correct) is close to the expected accuracy (66%) for this protein. The larger database and use of a new network training protocol also improve structural class prediction accuracy to 86%, relative to 80% obtained previously. Secondary structure content is predicted with accuracy comparable to that obtained with spectroscopic methods, such as vibrational or electronic circular dichroism and Fourier transform infrared spectroscopy.
一个一级神经网络和一个二级神经网络被应用于对包含681条非同源蛋白质链的数据库进行二级结构和结构类别预测。一种解码二级结构预测网络输出的新方法被用于估计在序列中每个位置发现每种二级结构类型的概率。除了能可靠地估计预测的准确性外,该方法给出的Q3(74.6%)比常用的截断方法更准确。在评判方法中使用这些预测结果可将Q3提高到74.8%,这是目前可获得的最佳结果。在一个常用于比较预测方法的包含126种蛋白质的数据库上,评判预测的准确率为76.6%。通过对序列中所有残基的预测估计准确性进行平均,得出给定序列的总体Q3估计值。例如,该分析应用于目标蛋白β-隐地蛋白,它在CASP2研究中是从头预测的一个困难目标;结果表明,用本方法做出的预测(62%的残基正确)接近该蛋白的预期准确性(66%)。相对于之前获得的80%,更大的数据库和新的网络训练协议的使用也将结构类别预测准确性提高到了86%。二级结构含量的预测准确性与通过光谱方法(如振动或电子圆二色性以及傅里叶变换红外光谱)获得的准确性相当。