Nishikawa K
Biochim Biophys Acta. 1983 Oct 28;748(2):285-99. doi: 10.1016/0167-4838(83)90306-0.
A method predicting protein secondary structure from sequence information could be assessed for its real efficiency by applying it to a number of proteins which lie completely outside a given data set. This type of test is performed for the three methods of Chou and Fasman (Adv. Enzymol. 47 (1978) 45-148), Robson and co-workers (J. Mol. Biol. 120 (1978) 97-120) and Lim (J. Mol. Biol. 88 (1974) 873-894) by using data of 19 proteins for the former two methods and 11 proteins for the method of Lim. The prediction abilities of these methods turn out to be of almost the same level, but unexpectedly low: their average scores are commonly less than 55% measured by the three-state assessment (alpha, beta and coil) or less than 45% measured by the four-state assessment (alpha, beta, turn and coil). This level of accuracy is more than 20% lower than that of current expectations as summarized by Schulz and Schirmer (Principles of Protein Structure (1979) Ch. 6, Springer, New York). A joint prediction attempted with the simultaneous usage of the three prediction methods did not improve the results. Causes and implications of the unsatisfactory results are discussed. In this study, computer programs were prepared for the methods of Chou and Fasman and of Robson and co-workers. While difficulties arose in the course of the computerization of the Chou-Fasman method, the prediction algorithm was arranged in a fully automatic form with optimization of the original rules as well as introduction of a modified treatment for solving the overlap among initially predicted regions of the secondary structures. Large discrepancies observed between the original results and those obtained by the computerized method are examined.
一种从序列信息预测蛋白质二级结构的方法,可以通过将其应用于完全处于给定数据集之外的多种蛋白质来评估其实际效率。针对Chou和Fasman(《酶学进展》47卷(1978年)45 - 148页)、Robson及其同事(《分子生物学杂志》120卷(1978年)97 - 120页)以及Lim(《分子生物学杂志》88卷(1974年)873 - 894页)的三种方法进行了此类测试,前两种方法使用了19种蛋白质的数据,Lim的方法使用了11种蛋白质的数据。结果表明,这些方法的预测能力几乎处于同一水平,但出乎意料地低:通过三态评估(α螺旋、β折叠和无规卷曲)测量时,它们的平均得分通常低于55%;通过四态评估(α螺旋、β折叠、转角和无规卷曲)测量时,平均得分低于45%。这个准确率水平比Schulz和Schirmer(《蛋白质结构原理》(1979年)第6章,施普林格出版社,纽约)总结的当前预期低20%以上。同时使用这三种预测方法进行联合预测并没有改善结果。讨论了结果不尽人意的原因及影响。在本研究中,为Chou和Fasman的方法以及Robson及其同事的方法编写了计算机程序。虽然在Chou - Fasman方法的计算机化过程中出现了困难,但预测算法以全自动形式进行了安排,对原始规则进行了优化,并引入了一种改进处理方法来解决二级结构初始预测区域之间的重叠问题。研究了原始结果与计算机化方法所得结果之间观察到的巨大差异。