Arndt Peter F
Max Planck Institute for Molecular Genetics, Ihnestr. 63, 14195 Berlin, Germany.
Gene. 2007 Apr 1;390(1-2):75-83. doi: 10.1016/j.gene.2006.11.022. Epub 2006 Dec 14.
Maximum likelihood phylogeny reconstruction methods are widely used in uncovering and assessing the evolutionary history and relationships of natural systems. However, several simplifying assumptions commonly made in this analysis limit the explanatory power of the results obtained. We present an algorithm that performs the phylogenetic analysis without making the common assumptions for sequence data from at least three leaf nodes in a star phylogeny. In particular, the underlying nucleotide substitution model does not have to be reversible and may include neighbor-dependent processes like the CpG methylation deamination process (CpG-effect). The base composition of the sequences at the external nodes and the one of the ancestral sequence may be different from each other and they do not have to be stationary state distributions of the corresponding substitution model. The algorithm is able to reconstruct the ancestral base composition and accurately estimate substitution frequencies in the branches of the star phylogeny. Extensive tests on simulated data validate the very favorable performance of the algorithm. As an application we present the analysis of aligned genomic sequences from human, mouse, and dog. Different substitution pattern can be observed in the three lineages.
最大似然系统发育重建方法被广泛用于揭示和评估自然系统的进化历史及关系。然而,该分析中通常所做的几个简化假设限制了所得结果的解释力。我们提出了一种算法,该算法在对星状系统发育中至少三个叶节点的序列数据不做常见假设的情况下进行系统发育分析。特别地,基础核苷酸替换模型不必是可逆的,并且可能包括诸如CpG甲基化脱氨过程(CpG效应)等依赖邻居的过程。外部节点处序列的碱基组成与祖先序列的碱基组成可能彼此不同,并且它们不必是相应替换模型的稳态分布。该算法能够重建祖先碱基组成,并准确估计星状系统发育分支中的替换频率。对模拟数据的广泛测试验证了该算法非常良好的性能。作为一个应用,我们展示了对来自人类、小鼠和狗的比对基因组序列的分析。在这三个谱系中可以观察到不同的替换模式。