Brocchieri L
Department of Mathematics, Stanford University, Stanford, California 94305-2125, USA.
Theor Popul Biol. 2001 Feb;59(1):27-40. doi: 10.1006/tpbi.2000.1485.
Conflicting results often accompany phylogenetic analyses of RNA, DNA, or protein sequences across diverse species. Causes contributing to these conflicts relate to ambiguities in identifying homologous characters of alignments, sensitivity of tree-making methods to unequal evolutionary rates, biases in species sampling, unrecognized paralogy, functional differentiation, loss of phylogenetic informational content due to long branches or fast evolution, and difficulties with the assumptions and approximations used to infer phylogenetic relationships. Attempts to surmount these conflicts by averaging over many proteins are problematic due to inherent biases of selected families, lack of signal in others, and events of lateral transfer, fusion, and/or chimerism. The process of assessing reliability of the results using the bootstrap method is strewn with obstacles because of lack of independence and inhomogeneity in the molecular data. Problems inherent to the three major procedures for developing phylogenetic trees--parsimony, likelihood, distance--are reviewed. Special attention is given to the problem of inferring evolutionary distances from patterns of similarity among sequences. The difficulties encountered by methods of phylogenetic reconstructions based on the analysis of divergent sequence families make new methods based on the analysis of complete genomes reasonable alternatives. Several of these are considered, including the signature sequences of Gupta and associates, the study of genome profiles, and the genomic signature set forth by Karlin and colleagues.
对不同物种的RNA、DNA或蛋白质序列进行系统发育分析时,常常会得出相互矛盾的结果。导致这些矛盾的原因包括:在确定比对的同源特征时存在模糊性、建树方法对不等进化速率的敏感性、物种抽样偏差、未识别的旁系同源性、功能分化、由于长分支或快速进化导致系统发育信息内容的丢失,以及用于推断系统发育关系的假设和近似方法存在困难。由于所选家族存在固有偏差、其他家族缺乏信号以及横向转移、融合和/或嵌合事件,通过对多种蛋白质进行平均来克服这些矛盾的尝试存在问题。由于分子数据缺乏独立性和不均匀性,使用自展法评估结果可靠性的过程充满障碍。本文回顾了构建系统发育树的三个主要程序(简约法、似然法、距离法)所固有的问题。特别关注从序列间相似性模式推断进化距离的问题。基于分歧序列家族分析的系统发育重建方法所遇到的困难使得基于完整基因组分析的新方法成为合理的替代方案。本文考虑了其中几种方法,包括古普塔及其同事的特征序列、基因组图谱研究以及卡林及其同事提出的基因组特征。