Kurgan Lukasz
Department of Electrical and Computer Engineering, University of Alberta, 2nd floor, ECERF (9107 116 Street), Edmonton, AB, Canada T6G 2V4.
Protein J. 2008 Jun;27(4):234-9. doi: 10.1007/s10930-008-9129-0.
Accurately predicted protein secondary structure provides useful information for target selection, to analyze protein function and to predict higher dimensional structure. Existing research shows that more data + refined search = better prediction. We analyze relation between the prediction accuracy and another crucial factor, the protein size. Empirical tests performed with two secondary structure predictors on a large set of high-resolution, non-redundant proteins show that the average accuracies for small proteins (<100 residues) equal 73% and 54% for alpha-helices and beta-strands, respectively. The alpha-helix/beta-strand accuracies for very large proteins (>300 residues) equal 77%/68%, respectively. Similarly, the tests with three secondary structure content predictors show that the prediction errors for the small/very large proteins equal 0.13/0.09 and 0.09/0.06 for alpha-helix and beta-strand content, respectively. Our tests confirm that the secondary structure/content predictions for the very large proteins are characterized statistically significantly better quality than prediction for the small proteins. This is in contrast with the tertiary structure predictions in which higher accuracy is obtained for smaller proteins.
准确预测的蛋白质二级结构为靶点选择、分析蛋白质功能以及预测更高维度结构提供了有用信息。现有研究表明,更多的数据+精细的搜索=更好的预测。我们分析了预测准确性与另一个关键因素——蛋白质大小之间的关系。使用两种二级结构预测器对一大组高分辨率、非冗余蛋白质进行的实证测试表明,小蛋白质(<100个残基)的α螺旋和β链的平均准确率分别为73%和54%。非常大的蛋白质(>300个残基)的α螺旋/β链准确率分别为77%/68%。同样,使用三种二级结构含量预测器进行的测试表明,小/非常大的蛋白质的α螺旋和β链含量的预测误差分别为0.13/0.09和0.09/0.06。我们的测试证实,非常大的蛋白质的二级结构/含量预测在统计学上的质量明显优于小蛋白质的预测。这与三级结构预测相反,在三级结构预测中,较小的蛋白质能获得更高的准确性。