Department of Biochemistry, Duke University Medical Center, Durham, North Carolina 27710, USA.
Proteins. 2009;77 Suppl 9(Suppl 9):29-49. doi: 10.1002/prot.22551.
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by the global distance test (GDT) and related scores (based on multiple superpositions of Calpha atoms between target structure and predictions labeled "Model 1"). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how "protein-like" the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance. Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, whereas self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Calpha accuracy.
在 CASP8 蛋白质结构预测技术评估中的基于模板建模中,这项工作开发并应用了六个新的全模型指标。它们旨在通过全局距离测试 (GDT) 和相关分数(基于目标结构和预测标签为“Model 1”的 Calpha 原子的多个超叠)来补充和增加基于模板的评估的价值。新指标在每个目标上评估每个预测器组,使用其最佳模型中具有平均以上 GDT 的所有原子。两个指标评估预测模型的“蛋白质样”程度:用于验证实验结构的 MolProbity 分数,以及使用所有原子的主链现实分数,包括立体冲突、键长和角度异常以及主链二面角。其他四个新指标评估模型与目标的主链和侧链氢键、侧链末端定位和侧链旋转异构体的匹配。六个全模型度量的组平均 Z 分数与组平均 GDT Z 分数平均,以产生全模型、高精度性能的总体排名。还报告了针对预测器组性能的特定方面的单独评估,例如近似正确模板或折叠识别的稳健性,以及自我评分能力以识别其模型中的最佳模型。如果考虑目标难度,折叠识别与组平均 GDT Z 分数不同但相关,而自我评分是由服务器完成的,与 GDT 性能无关。确定并讨论了特定目标上的杰出单个模型。预测器组在不同方面表现出色,突出了当前方法的多样性。然而,良好的全模型得分与高 Calpha 精度高度相关。