Eramian David, Shen Min-yi, Devos Damien, Melo Francisco, Sali Andrej, Marti-Renom Marc A
Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA.
Protein Sci. 2006 Jul;15(7):1653-66. doi: 10.1110/ps.062095806. Epub 2006 Jun 2.
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
在蛋白质结构建模中,可靠地预测模型准确性是一个重要的未解决问题。为了解决这个问题,我们研究了24种个体评估分数,包括基于物理的能量函数、统计势和基于机器学习的评分函数。还使用个体分数通过支持向量机(SVM)回归构建了约85,000种复合评分函数。测试了这些分数从20个代表性蛋白质结构的6000个比较模型中识别最接近天然结构模型的能力。20个目标中的每一个都使用序列同一性小于30%的模板进行建模,这对应于具有挑战性的比较建模情况。最佳的支持向量机分数通过将被确定为该组中最佳模型与具有最低均方根偏差(RMSD)的模型之间的平均RMSD差异(DeltaRMSD)从0.63 Å降低到0.45 Å,同时与RMSD的皮尔逊相关系数(r = 0.87)高于任何其他测试分数,从而优于所有个体分数。最准确的分数基于DOPE非氢原子统计势、MODPIPE的表面、接触和组合统计势以及两个PSIPRED/DSSP分数的组合。它在SVMod程序中实现,现在可应用于在各种建模问题中选择最终模型,包括折叠分配、目标-模板比对和环建模。