Department of Life Sciences, Faculty of Natural Sciences, Imperial College London, London, UK.
Biopolymers. 2023 Mar;114(3):e23530. doi: 10.1002/bip.23530. Epub 2023 Feb 8.
Coevolution between protein residues is normally interpreted as direct contact. However, the evolutionary record of a protein sequence contains rich information that may include long-range functional couplings, couplings that report on homo-oligomeric states or even conformational changes. Due to the complexity of the sequence space and the lack of structural information on various members of a protein family, it has been difficult to effectively mine the additional information encoded in a multiple sequence alignment (MSA). Here, taking advantage of the recent release of the AlphaFold (AF) database we attempt to identify coevolutionary couplings that cannot be explained simply by spatial proximity. We propose a simple computational method that performs direct coupling analysis on a MSA and searches for couplings that are not satisfied in any of the AF models of members of the identified protein family. Application of this method on 2012 protein families suggests that ~12% of the total identified coevolving residue pairs are spatially distant and more likely to be disordered than their contacting counterparts. We expect that this analysis will help improve the quality of coevolutionary distance restraints used for structure determination and will be useful in identifying potentially functional/allosteric cross-talk between distant residues.
蛋白质残基之间的共进化通常被解释为直接接触。然而,蛋白质序列的进化记录包含了丰富的信息,这些信息可能包括远程功能耦联、报告同寡聚态甚至构象变化的耦联。由于序列空间的复杂性以及蛋白质家族中各个成员缺乏结构信息,因此很难有效地挖掘多重序列比对(MSA)中编码的附加信息。在这里,我们利用最近发布的 AlphaFold (AF) 数据库,尝试识别不能仅通过空间接近来解释的共进化耦联。我们提出了一种简单的计算方法,该方法在 MSA 上执行直接耦联分析,并搜索在鉴定出的蛋白质家族成员的任何 AF 模型中都不满足的耦联。将该方法应用于 2012 个蛋白质家族表明,在总共鉴定出的共进化残基对中,约有 12%的残基在空间上是遥远的,而且比它们的接触对应物更有可能无序。我们预计,这种分析将有助于提高用于结构确定的共进化距离约束的质量,并有助于识别远程残基之间潜在的功能/变构串扰。