Kryshtafovych Andriy, Monastyrskyy Bohdan, Fidelis Krzysztof, Schwede Torsten, Tramontano Anna
Genome Center, University of California, Davis, California.
Biozentrum, University of Basel, Basel, Switzerland.
Proteins. 2018 Mar;86 Suppl 1(Suppl 1):345-360. doi: 10.1002/prot.25371. Epub 2017 Sep 8.
The record high 42 model accuracy estimation methods were tested in CASP12. The paper presents results of the assessment of these methods in the whole-model and per-residue accuracy modes. Scores from four different model evaluation packages were used as the "ground truth" for assessing accuracy of methods' estimates. They include a rigid-body score-GDT_TS, and three local-structure based scores-LDDT, CAD and SphereGrinder. The ability of methods to identify best models from among several available, predict model's absolute accuracy score, distinguish between good and bad models, predict accuracy of the coordinate error self-estimates, and discriminate between reliable and unreliable regions in the models was assessed. Single-model methods advanced to the point where they are better than clustering methods in picking the best models from decoy sets. On the other hand, consensus methods, taking advantage of the availability of large number of models for the same target protein, are still better in distinguishing between good and bad models and predicting local accuracy of models. The best accuracy estimation methods were shown to perform better with respect to the frozen in time reference clustering method and the results of the best method in the corresponding class of methods from the previous CASP. Top performing single-model methods were shown to do better than all but three CASP12 tertiary structure predictors when evaluated as model selectors.
在蛋白质结构预测关键评估第12轮(CASP12)中测试了创纪录的42种模型准确性估计方法。本文展示了这些方法在全模型和每个残基准确性模式下的评估结果。来自四个不同模型评估软件包的分数被用作评估方法估计准确性的“基准事实”。它们包括一个刚体分数——全局距离测试总分(GDT_TS),以及三个基于局部结构的分数——线性离散密度(LDDT)、坐标原子距离(CAD)和球形研磨器(SphereGrinder)。评估了这些方法从多个可用模型中识别最佳模型、预测模型的绝对准确性分数、区分好模型和坏模型、预测坐标误差自我估计准确性以及区分模型中可靠和不可靠区域的能力。单模型方法已经发展到在从诱饵集中挑选最佳模型方面比聚类方法更好的程度。另一方面,共识方法利用了针对同一目标蛋白有大量模型这一条件,在区分好模型和坏模型以及预测模型的局部准确性方面仍然更胜一筹。结果表明,最佳准确性估计方法相对于固定时间参考聚类方法以及上一轮CASP中相应方法类别里最佳方法的结果表现更佳。当作为模型选择器进行评估时,表现最佳的单模型方法被证明比除了三个CASP12三级结构预测器之外的所有方法都要好。