Pollock D D, Taylor W R, Goldman N
Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London, NW7 1AA, UK.
J Mol Biol. 1999 Mar 19;287(1):187-98. doi: 10.1006/jmbi.1998.2601.
The identification of protein sites undergoing correlated evolution (coevolution) is of great interest due to the possibility that these pairs will tend to be adjacent in the three-dimensional structure. Identification of such pairs should provide useful information for understanding the evolutionary process, predicting the effects of site-directed substitution, and potentially for predicting protein structure. Here, we develop and apply a maximum likelihood method with the aim of improving detection of coevolution. Unlike previous methods which have had limited success, this method allows for correlations induced by phylogenetic relationships and for variation in rate of evolution along branches, and does not rely on accurate reconstruction of ancestral nodes. In order to reduce the complexity of coevolutionary relationships and identify the primary component of pairwise coevolution between two sites, we reduce the data to a two-state system at each site, regardless of the actual number of residues observed at that site. Simulations show that this strategy is good at identifying simple correlations and at recognizing cases in which the data are insufficient to distinguish between coevolution and spurious correlations. The new method was tested by using size and charge characteristics to group the residues at each site, and then evaluating coevolution in myoglobin sequences. Grouping based on physicochemical characteristics allows categorization of coevolving sites into positive and negative coevolution, depending on the correlation between equilibrium state frequencies. We detected a striking excess of negative coevolution (corresponding to charge) at sites brought into proximity by the periodicity of the alpha-helix, and there was also a tendency for sites with significant likelihood ratios to be close in the three-dimensional structure. Sites on the surface of the protein appear to coevolve both when they are close in the structure, and when they are distant, implying a role for folding and/or avoidance of quaternary structure in the coevolution process.
识别经历协同进化的蛋白质位点备受关注,因为这些位点对在三维结构中往往相邻。识别此类位点对理解进化过程、预测定点取代的影响以及潜在地预测蛋白质结构应能提供有用信息。在此,我们开发并应用了一种最大似然法,旨在改进对协同进化的检测。与此前成效有限的方法不同,该方法考虑了系统发育关系所引发的相关性以及沿分支的进化速率变化,且不依赖于祖先节点的精确重建。为降低协同进化关系的复杂性并识别两个位点间成对协同进化的主要成分,我们将每个位点的数据简化为二态系统,而不管该位点实际观察到的残基数量。模拟表明,此策略擅长识别简单相关性以及识别数据不足以区分协同进化与虚假相关性的情况。通过利用大小和电荷特征对每个位点的残基进行分组,然后评估肌红蛋白序列中的协同进化,对新方法进行了测试。基于物理化学特征的分组可根据平衡态频率之间的相关性将协同进化位点分为正协同进化和负协同进化。我们在由α螺旋周期性拉近的位点处检测到显著过量的负协同进化(对应电荷),并且具有显著似然比的位点在三维结构中也有靠近的趋势。蛋白质表面的位点在结构上靠近时以及距离较远时似乎都会协同进化,这意味着折叠和/或避免四级结构在协同进化过程中发挥了作用。