Criscuolo Alexis, Michel Christian J
Equipe de Bioinformatique Théorique, LSIIT, FDBT (UMR CNRS-ULP 7005), Université de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400, Illkirch, France.
J Mol Evol. 2009 Apr;68(4):377-92. doi: 10.1007/s00239-009-9212-y. Epub 2009 Mar 24.
We develop a new approach to estimate a matrix of pairwise evolutionary distances from a codon-based alignment based on a codon evolutionary model. The method first computes a standard distance matrix for each of the three codon positions. Then these three distance matrices are weighted according to an estimate of the global evolutionary rate of each codon position and averaged into a unique distance matrix. Using a large set of both real and simulated codon-based alignments of nucleotide sequences, we show that this approach leads to distance matrices that have a significantly better treelikeness compared to those obtained by standard nucleotide evolutionary distances. We also propose an alternative weighting to eliminate the part of the noise often associated with some codon positions, particularly the third position, which is known to induce a fast evolutionary rate. Simulation results show that fast distance-based tree reconstruction algorithms on distance matrices based on this codon position weighting can lead to phylogenetic trees that are at least as accurate as, if not better, than those inferred by maximum likelihood. Finally, a well-known multigene dataset composed of eight yeast species and 106 codon-based alignments is reanalyzed and shows that our codon evolutionary distances allow building a phylogenetic tree which is similar to those obtained by non-distance-based methods (e.g., maximum parsimony and maximum likelihood) and also significantly improved compared to standard nucleotide evolutionary distance estimates.
我们开发了一种新方法,用于从基于密码子进化模型的密码子比对中估计成对进化距离矩阵。该方法首先为三个密码子位置中的每一个计算一个标准距离矩阵。然后,根据每个密码子位置的全局进化速率估计值对这三个距离矩阵进行加权,并将其平均为一个唯一的距离矩阵。使用大量真实和模拟的基于密码子的核苷酸序列比对,我们表明,与通过标准核苷酸进化距离获得的距离矩阵相比,这种方法得到的距离矩阵具有明显更好的树状相似性。我们还提出了一种替代加权方法,以消除通常与某些密码子位置(特别是第三位置,已知其进化速率较快)相关的部分噪声。模拟结果表明,基于这种密码子位置加权的距离矩阵上的快速基于距离的树重建算法可以生成至少与最大似然法推断的系统发育树一样准确(如果不是更准确)的系统发育树。最后,对一个由八个酵母物种和106个基于密码子的比对组成的著名多基因数据集进行了重新分析,结果表明,我们的密码子进化距离能够构建一个与通过非基于距离的方法(如最大简约法和最大似然法)获得的系统发育树相似的系统发育树,并且与标准核苷酸进化距离估计相比也有显著改进。