Lake J A
Molecular Biology Institute, University of California, Los Angeles 90024.
Proc Natl Acad Sci U S A. 1994 Feb 15;91(4):1455-9. doi: 10.1073/pnas.91.4.1455.
The reconstruction of phylogenetic trees from DNA and protein sequences is confounded by unequal rate effects. These effects can group rapidly evolving taxa with other rapidly evolving taxa, whether or not they are genealogically related. All algorithms are sensitive to these effects whenever the assumptions on which they are based are not met. The algorithm presented here, called paralinear distances, is valid for a much broader class of substitution processes than previous algorithms and is accordingly less affected by unequal rate effects. It may be used with all nucleic acid, protein, or other sequences, provided that their evolution may be modeled as a succession of Markov processes. The properties of the method have been proven both analytically and by computer simulations. Like all other methods, paralinear distances can fail when sequences are misaligned or when site-to-site sequence variation of rates is extensive. To examine the usefulness of paralinear distances, the "origin of the eukaryotes" has been investigated by the analysis of elongation factor Tu sequences with a variety of sequence alignments. It has been found that the order in which sequences are pairwise aligned strongly determines the topology which is reconstructed by paralinear distances (as it does for all other reconstruction methods tested). When the parts of the alignment that are unaffected by alignment order are analyzed, paralinear distances strongly select the eocyte topology. This provides evidence that the eocyte prokaryotes are the closest prokaryotic relatives of the eukaryotes.
从DNA和蛋白质序列重建系统发育树会受到不等速率效应的干扰。这些效应会将快速进化的分类群与其他快速进化的分类群归为一类,无论它们在谱系上是否相关。只要算法所基于的假设不成立,所有算法都会对这些效应敏感。这里提出的算法称为平行线性距离算法,它适用于比以前的算法更广泛的一类替换过程,因此受不等速率效应的影响较小。只要其进化可以建模为一系列马尔可夫过程,它就可以用于所有核酸、蛋白质或其他序列。该方法的性质已经通过解析和计算机模拟得到证明。与所有其他方法一样,当序列比对错误或位点间序列速率变化很大时,平行线性距离算法可能会失效。为了检验平行线性距离算法的实用性,通过对多种序列比对的延伸因子Tu序列进行分析,研究了“真核生物的起源”。已经发现,序列两两比对的顺序强烈地决定了由平行线性距离算法重建的拓扑结构(对于所有其他测试的重建方法也是如此)。当分析比对中不受比对顺序影响的部分时,平行线性距离算法强烈支持“曙细胞”拓扑结构。这提供了证据表明,曙细胞原核生物是真核生物最亲近的原核生物亲属。