Zhejiang Gongshang University, Hangzhou, Zhejiang, P.R. China.
Brief Bioinform. 2011 Nov;12(6):672-88. doi: 10.1093/bib/bbq088. Epub 2011 Jan 20.
Sequence-based prediction of protein secondary structure (SS) enjoys wide-spread and increasing use for the analysis and prediction of numerous structural and functional characteristics of proteins. The lack of a recent comprehensive and large-scale comparison of the numerous prediction methods results in an often arbitrary selection of a SS predictor. To address this void, we compare and analyze 12 popular, standalone and high-throughput predictors on a large set of 1975 proteins to provide in-depth, novel and practical insights. We show that there is no universally best predictor and thus detailed comparative studies are needed to support informed selection of SS predictors for a given application. Our study shows that the three-state accuracy (Q3) and segment overlap (SOV3) of the SS prediction currently reach 82% and 81%, respectively. We demonstrate that carefully designed consensus-based predictors improve the Q3 by additional 2% and that homology modeling-based methods are significantly better by 1.5% Q3 than ab initio approaches. Our empirical analysis reveals that solvent exposed and flexible coils are predicted with a higher quality than the buried and rigid coils, while inverse is true for the strands and helices. We also show that longer helices are easier to predict, which is in contrast to longer strands that are harder to find. The current methods confuse 1-6% of strand residues with helical residues and vice versa and they perform poorly for residues in the β- bridge and 3(10)-helix conformations. Finally, we compare predictions of the standalone implementations of four well-performing methods with their corresponding web servers.
基于序列的蛋白质二级结构(SS)预测在分析和预测蛋白质的许多结构和功能特征方面得到了广泛而日益广泛的应用。由于缺乏对众多预测方法的最新全面和大规模比较,导致对 SS 预测器的选择往往是任意的。为了解决这一空白,我们在一个由 1975 个蛋白质组成的大型数据集上比较和分析了 12 种流行的、独立的和高通量的预测器,以提供深入、新颖和实用的见解。我们表明,没有普遍最好的预测器,因此需要详细的比较研究来支持在给定应用程序中选择 SS 预测器。我们的研究表明,二级结构预测的三态准确性(Q3)和片段重叠(SOV3)目前分别达到 82%和 81%。我们证明,精心设计的基于共识的预测器可以将 Q3 提高 2%,基于同源建模的方法比从头开始的方法高出 1.5%的 Q3。我们的实证分析表明,溶剂暴露和柔性线圈的预测质量高于埋藏和刚性线圈,而反向则适用于链和螺旋。我们还表明,较长的螺旋较容易预测,这与较长的链较难找到相反。目前的方法将 1-6%的链残基与螺旋残基混淆,反之亦然,它们在β-桥和 3(10)-螺旋构象中的残基表现不佳。最后,我们比较了四种表现良好的方法的独立实现与它们对应的网络服务器的预测。