Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada.
Curr Protein Pept Sci. 2012 Feb;13(1):6-18. doi: 10.2174/138920312799277938.
Intrinsic disorder is relatively common in proteins, plays important roles in numerous cellular activities, and its prevalence was implicated in various human diseases. However, annotations of the disorder lag behind the rapidly increasing number of known protein chains. The last decade observed development of a relatively large number of in-silico methods that predict the disorder using the protein sequence as their input. We perform a first-of-its kind comprehensive empirical evaluation of the disorder predictors which is characterized by three novel aspects, (1) we evaluate the quality of the disorder predictions at the residue, segment, and chain levels; (2) we consider a large number of published and accessible to the end user predictors that are evaluated on a relatively big dataset with close to 500 proteins; and (3) we assess statistical significance of differences between the considered methods. Our study reveals that there is no universally superior predictor and that the top-performing methods are complementary. We show that while recent consensus-based predictors outperform other considered methods for the residue-level predictions, some older methods perform better for the prediction of the disordered segments. Our analysis indicates that certain predictors are biased to under-predict the disorder, while some other solutions tend to over-predict the number of the disordered residues. We also evaluate the utility of the predicted residue-level disorder for prediction of proteins with long disordered segments and prediction of the chainlevel disorder content. Lastly, we provide recommendations concerning development of a new generation of consensusbased methods and specialized methods for improved prediction of the disorder content.
固有无序在蛋白质中较为常见,在许多细胞活动中发挥着重要作用,其普遍性与多种人类疾病有关。然而,无序区域的注释远远落后于已知蛋白质链数量的快速增长。过去十年见证了大量基于计算机的预测方法的发展,这些方法使用蛋白质序列作为输入来预测无序。我们对无序预测器进行了首次全面的实证评估,具有三个新颖的方面:(1)我们在残基、片段和链水平上评估无序预测的质量;(2)我们考虑了大量已发布的、可被终端用户访问的预测器,这些预测器在一个相对较大的数据集上进行评估,该数据集包含近 500 种蛋白质;(3)我们评估了所考虑方法之间差异的统计显著性。我们的研究表明,没有普遍优越的预测器,表现最好的方法是互补的。我们表明,尽管最近基于共识的预测器在残基水平预测方面优于其他考虑的方法,但某些较旧的方法在无序片段的预测方面表现更好。我们的分析表明,某些预测器存在低估无序的偏差,而其他一些解决方案则倾向于过度预测无序残基的数量。我们还评估了预测的残基水平无序对长无序片段的蛋白质预测和链水平无序内容预测的效用。最后,我们就开发新一代基于共识的方法和改进无序内容预测的专门方法提出了建议。