Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
Department of Informatics, UCB Pharma, Slough SL1 3WE, UK
Bioinformatics. 2017 Feb 1;33(3):373-381. doi: 10.1093/bioinformatics/btw618.
Co-evolution methods have been used as contact predictors to identify pairs of residues that share spatial proximity. Such contact predictors have been compared in terms of the precision of their predictions, but there is no study that compares their usefulness to model generation.
We compared eight different co-evolution methods for a set of ∼3500 proteins and found that metaPSICOV stage 2 produces, on average, the most precise predictions. Precision of all the methods is dependent on SCOP class, with most methods predicting contacts in all α and membrane proteins poorly. The contact predictions were then used to assist in de novo model generation. We found that it was not the method with the highest average precision, but rather metaPSICOV stage 1 predictions that consistently led to the best models being produced. Our modelling results show a correlation between the proportion of predicted long range contacts that are satisfied on a model and its quality. We used this proportion to effectively classify models as correct/incorrect; discarding decoys classified as incorrect led to an enrichment in the proportion of good decoys in our final ensemble by a factor of seven. For 17 out of the 18 cases where correct answers were generated, the best models were not discarded by this approach. We were also able to identify eight cases where no correct decoy had been generated.
Data is available for download from: http://opig.stats.ox.ac.uk/resources.
Supplementary data are available at Bioinformatics online.
共进化方法已被用作接触预测因子,以识别具有空间接近性的残基对。已经根据预测精度比较了这些接触预测因子,但没有研究比较它们在模型生成方面的有用性。
我们比较了八种不同的共进化方法对一组约 3500 种蛋白质,发现 metaPSICOV 阶段 2 平均产生最精确的预测。所有方法的精度都依赖于 SCOP 类,其中大多数方法都无法很好地预测所有α和膜蛋白中的接触。然后将接触预测用于辅助从头模型生成。我们发现,并不是平均精度最高的方法,而是 metaPSICOV 阶段 1 的预测,始终导致生成最佳模型。我们的建模结果表明,预测的长程接触的比例与模型的质量之间存在相关性。我们使用这个比例有效地将模型分为正确/不正确;通过将不正确的诱饵分类为不正确,可以将我们最终集合中的良好诱饵的比例提高 7 倍。在产生正确答案的 17 个案例中,有 17 个案例都没有通过这种方法丢弃最佳模型。我们还能够确定 8 个案例中没有生成正确的诱饵。
数据可从以下网址下载:http://opig.stats.ox.ac.uk/resources。
补充数据可在 Bioinformatics 在线获取。