Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland.
Toyota Technological Institute at Chicago, Chicago, IL 60637, USA.
Bioinformatics. 2017 Nov 1;33(21):3405-3414. doi: 10.1093/bioinformatics/btx416.
Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein.
We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX.
All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/.
malgorzata.kotulska@pwr.edu.pl.
Supplementary data are available at Bioinformatics online.
除了元预测因子外,当今大多数残基-残基接触预测方法都是完全基于多序列比对(MSA)中相关突变的直接耦合分析(DCA)。这些方法对于每个蛋白质中预测最强的 100 个接触点的平均准确率约为 40%。对于单个感兴趣的蛋白质,终端用户不知道预测的准确率是否高于或低于 40%,如果接触点预测会影响到该蛋白质的实验研究,这将是一个特别的问题。
我们设计了一个回归模型,该模型可以平均预测准确率的误差为 7 个百分点,预测单个蛋白质的残基-残基接触。使用了两种 DCA 方法(gplmDCA 和 PSICOV)进行预测。模型基于描述 MSA、预测二级结构、预测溶剂可及性和目标蛋白质接触预测得分的参数构建。结果表明,我们的模型也可以应用于元方法,在 RaptorX 上进行了测试。
所有数据和脚本均可从 http://comprec-lin.iiar.pwr.edu.pl/dcaQ/ 获得。
malgorzata.kotulska@pwr.edu.pl。
补充数据可在生物信息学在线获得。