Benkert Pascal, Schwede Torsten, Tosatto Silvio Ce
Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
BMC Struct Biol. 2009 May 20;9:35. doi: 10.1186/1472-6807-9-35.
The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus.
Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach.
Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
从一组备选模型中选择最准确的蛋白质模型是基于模板和从头预测方法的蛋白质结构预测中的关键步骤。已经开发出评分函数,其既可以返回单个模型的质量估计值,也可以从给定序列的模型集合中包含的信息得出一个分数。在集合中更频繁出现的局部结构特征更有可能是正确的。在蛋白质结构预测关键评估(CASP)实验的背景下,这些所谓的一致性方法已被证明在选择良好候选模型方面表现得相当出色,但如果最佳模型远离主要结构簇,则往往会失败。在本文中,我们表明,如果通过在计算结构一致性期间对使用的模型进行预过滤来结合这两种方法,可以改进模型选择。
我们最近发表的QMEAN复合评分函数通过纳入全原子相互作用势项得到了改进。基于新的QMEAN分数的初步模型排名用于选择一组可靠的模型,针对这些模型计算结构一致性分数。这个名为QMEANclust的评分函数在98个CASP7目标上平均预测质量分数与全局距离测试总分(GDT_TS)的相关系数达到0.9,并且在从服务器模型集合中选择良好模型方面比参与CASP7质量评估类别的任何其他团队表现都显著更好。这两个评分函数也在由20个目标蛋白质组成的MOULDER测试集上进行了基准测试,每个目标蛋白质有由MODELLER生成的300个备选模型。QMEAN优于所有其他对单个模型进行操作的测试评分函数,而一致性方法QMEANclust仅在包含一定比例近天然构象的诱饵集上能正常工作。我们还提出了一种用于逐个残基估计模型质量的局部版本的QMEAN(QMEANlocal),并将其与一种新的基于局部一致性的方法进行比较。
通过使用对单个模型进行操作的复合评分函数来富集更高质量的模型,随后使用这些模型来计算结构一致性,从而实现了改进的模型选择。像QMEANclust这样基于一致性的方法的性能高度依赖于要分析的模型集合成分和质量。因此,基于大型元数据集(例如CASP)对一致性方法的性能估计可能会高估它们在基于单个方法的较小模型集的更实际建模情况下的适用性。