Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania.
Genome Center UC Davis, 451 Health Sciences Drive, Davis, CA, USA.
Bioinformatics. 2019 Mar 15;35(6):937-944. doi: 10.1093/bioinformatics/bty760.
Measuring discrepancies between protein models and native structures is at the heart of development of protein structure prediction methods and comparison of their performance. A number of different evaluation methods have been developed; however, their comprehensive and unbiased comparison has not been performed.
We carried out a comparative analysis of several popular model assessment methods (RMSD, TM-score, GDT, QCS, CAD-score, LDDT, SphereGrinder and RPF) to reveal their relative strengths and weaknesses. The analysis, performed on a large and diverse model set derived in the course of three latest community-wide CASP experiments (CASP10-12), had two major directions. First, we looked at general differences between the scores by analyzing distribution, correspondence and correlation of their values as well as differences in selecting best models. Second, we examined the score differences taking into account various structural properties of models (stereochemistry, hydrogen bonds, packing of domains and chain fragments, missing residues, protein length and secondary structure). Our results provide a solid basis for an informed selection of the most appropriate score or combination of scores depending on the task at hand.
Supplementary data are available at Bioinformatics online.
衡量蛋白质模型与天然结构之间的差异是蛋白质结构预测方法发展和比较其性能的核心。已经开发了许多不同的评估方法;然而,它们并没有进行全面和无偏见的比较。
我们对几种流行的模型评估方法(RMSD、TM 分数、GDT、QCS、CAD 分数、LDDT、SphereGrinder 和 RPF)进行了比较分析,以揭示它们的相对优势和劣势。该分析是在三个最新的全社区 CASP 实验(CASP10-12)过程中得出的大型和多样化的模型集中进行的,有两个主要方向。首先,我们通过分析值的分布、对应和相关性以及最佳模型选择的差异,研究了评分之间的一般差异。其次,我们考虑了模型的各种结构特性(立体化学、氢键、结构域和链片段的组装、缺失残基、蛋白质长度和二级结构),研究了评分差异。我们的结果为根据手头的任务选择最合适的评分或评分组合提供了坚实的基础。
补充数据可在生物信息学在线获得。