Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America.
Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America.
PLoS One. 2018 Jun 28;13(6):e0199585. doi: 10.1371/journal.pone.0199585. eCollection 2018.
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts-precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such "informative" contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
在同源蛋白质的多重序列比对(MSA)中,对残基对的共进化长期以来一直被认为是结构接触的指标。最近,几种方法,如直接耦合分析(DCA)和 MetaPSICOV,已经被证明通过利用大量的序列数据,可以实现令人印象深刻的接触预测率。在本文中,我们表明预测成功率对接触的结构定义非常敏感,更宽松的定义(即,将更多对归类为真实接触)自然会导致更高的阳性预测率,但代价是每个接触贡献的结构信息量。因此,接触预测算法的剩余限制在与几何限制接触(正是那些在结构预测中贡献更多信息的接触)结合时最为明显。我们建议,为了提高此类“信息丰富”接触的预测率,可以将共进化得分与接触可能性的其他指标相结合。具体来说,我们发现当 MSA 中的一对共变位置被具有有利统计接触能的残基对占据时,该对更有可能代表真实接触。我们表明,将接触势能度量与 DCA 或 MetaPSICOV 相结合的性能明显优于单独使用 DCA 或 MetaPSICOV。这无论接触定义如何都是正确的,但对于更严格和更具信息量的接触定义尤其如此。总之,这项工作概述了接触预测中仍需要解决的一些挑战,并提出并验证了一个有前途的改进方向。