Department of Statistics, University of Oxford, Oxford OX1 3LB, UK.
SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94305, USA.
Bioinformatics. 2020 Mar 1;36(6):1750-1756. doi: 10.1093/bioinformatics/btz816.
Over the last few years, the field of protein structure prediction has been transformed by increasingly accurate contact prediction software. These methods are based on the detection of coevolutionary relationships between residues from multiple sequence alignments (MSAs). However, despite speculation, there is little evidence of a link between contact prediction and the physico-chemical interactions which drive amino-acid coevolution. Furthermore, existing protocols predict only a fraction of all protein contacts and it is not clear why some contacts are favoured over others. Using a dataset of 863 protein domains, we assessed the physico-chemical interactions of contacts predicted by CCMpred, MetaPSICOV and DNCON2, as examples of direct coupling analysis, meta-prediction and deep learning.
We considered correctly predicted contacts and compared their properties against the protein contacts that were not predicted. Predicted contacts tend to form more bonds than non-predicted contacts, which suggests these contacts may be more important than contacts that were not predicted. Comparing the contacts predicted by each method, we found that metaPSICOV and DNCON2 favour accuracy, whereas CCMPred detects contacts with more bonds. This suggests that the push for higher accuracy may lead to a loss of physico-chemically important contacts. These results underscore the connection between protein physico-chemistry and the coevolutionary couplings that can be derived from MSAs. This relationship is likely to be relevant to protein structure prediction and functional analysis of protein structure and may be key to understanding their utility for different problems in structural biology.
We use publicly available databases. Our code is available for download at https://opig.stats.ox.ac.uk/.
Supplementary information is available at Bioinformatics online.
在过去的几年中,蛋白质结构预测领域发生了重大转变,这主要得益于越来越精确的接触预测软件。这些方法基于从多重序列比对(MSA)中检测残基的共进化关系。然而,尽管存在推测,但几乎没有证据表明接触预测与驱动氨基酸共进化的物理化学相互作用之间存在联系。此外,现有的协议仅预测了所有蛋白质接触的一部分,并且不清楚为什么有些接触比其他接触更受欢迎。我们使用了 863 个蛋白质结构域的数据集,评估了 CCMpred、MetaPSICOV 和 DNCON2 预测的接触的物理化学相互作用,这些方法分别是直接耦合分析、元预测和深度学习的代表。
我们考虑了正确预测的接触,并将它们的性质与未预测的蛋白质接触进行了比较。预测的接触比未预测的接触更容易形成键,这表明这些接触可能比未预测的接触更重要。比较每种方法预测的接触,我们发现 metaPSICOV 和 DNCON2 更注重准确性,而 CCMPred 则检测到具有更多键的接触。这表明对更高准确性的追求可能会导致失去物理化学上重要的接触。这些结果强调了蛋白质物理化学性质与可以从 MSAs 中得出的共进化耦合之间的联系。这种关系可能与蛋白质结构预测和蛋白质结构功能分析有关,并且可能是理解它们在结构生物学不同问题中的应用的关键。
我们使用公共可用数据库。我们的代码可在 https://opig.stats.ox.ac.uk/ 下载。
补充信息可在 Bioinformatics 在线获取。