Graduate School of Engineering, Gunma University, Kiryu, Gunma, Japan.
PLoS One. 2013;8(1):e54252. doi: 10.1371/journal.pone.0054252. Epub 2013 Jan 16.
Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints in varying degrees on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing phylogenetic correlations and indirect dependences through other residues within a protein or even through other molecules. Rapid growth of protein families with unknown folds requires an accurate de novo prediction method for protein structure. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that inferred residue-residue proximities can be sufficient information to predict a protein fold without the use of known three-dimensional structures. Here, we propose an alternative method of inferring coevolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. Substitution probability and physico-chemical changes (volume, charge, hydrogen-bonding capability, and others) accompanied by substitutions at each site in each branch of a phylogenetic tree are estimated with the likelihood of each substitution, and their direct correlations between sites are used to detect concurrent and compensatory substitutions. In order to extract direct dependences between sites, partial correlation coefficients of the characteristic changes along branches between sites, in which linear multiple dependences on feature vectors at other sites are removed, are calculated and used to rank coevolving site pairs. Accuracy of contact prediction based on the present coevolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families taken from the Pfam release 26.0. Besides, this excellent accuracy indicates that compensatory substitutions are significant in protein evolution.
残基残基相互作用将蛋白质折叠成独特的三维结构,并使其发挥特定的功能,从而在不同程度上对每个残基位点施加结构和功能限制。残基位点的选择限制记录在同源序列中的氨基酸顺序中,也记录在氨基酸取代的进化轨迹中。一个挑战是通过去除系统发育相关性并通过蛋白质内的其他残基甚至通过其他分子去除间接相关性来提取残基位点之间的直接依赖性。具有未知折叠的蛋白质家族的快速增长需要一种准确的从头预测蛋白质结构的方法。最近尝试在多重序列比对中解开氨基酸类型之间的残基位置的直接和间接依赖性,结果表明推断出的残基残基接近度可以是足够的信息,无需使用已知的三维结构即可预测蛋白质折叠。在这里,我们提出了一种从系统发育树中每个分支的位点之间的并发和补偿替换中推断共进化位点对的替代方法。用每个替换的似然度来估计每个分支中每个位点的替换概率和物理化学变化(体积、电荷、氢键能力等),并使用它们之间的直接相关性来检测并发和补偿替换。为了提取位点之间的直接依赖性,计算了沿分支的位点之间特征变化的偏相关系数,其中去除了其他位点特征向量的线性多重依赖性,并用于对共进化的位点对进行排序。基于当前共进化评分的接触预测的准确性与从 Pfam 版本 26.0 中选择的 15 个蛋白质家族的蛋白质序列最大熵模型的预测准确性相当。此外,这种出色的准确性表明补偿性替换在蛋白质进化中很重要。