Vera-Ruiz Victor A, Robinson John, Jermiin Lars S
School of Mathematics and Statistics, University of Sydney, NSW 2006, Australia.
Department of Mathematics and Statistics, University of Nevada, Reno, NV 89557, USA.
Syst Biol. 2022 Apr 19;71(3):660-675. doi: 10.1093/sysbio/syab074.
In molecular phylogenetics, it is typically assumed that the evolutionary process for DNA can be approximated by independent and identically distributed Markovian processes at the variable sites and that these processes diverge over the edges of a rooted bifurcating tree. Sometimes the nucleotides are transformed from a 4-state alphabet to a 3- or 2-state alphabet by a procedure that is called recoding, lumping, or grouping of states. Here, we introduce a likelihood-ratio test for lumpability for DNA that has diverged under different Markovian conditions, which assesses the assumption that the Markovian property of the evolutionary process over each edge is retained after recoding of the nucleotides. The test is derived and validated numerically on simulated data. To demonstrate the insights that can be gained by using the test, we assessed two published data sets, one of mitochondrial DNA from a phylogenetic study of the ratites and the other of nuclear DNA from a phylogenetic study of yeast. Our analysis of these data sets revealed that recoding of the DNA eliminated some of the compositional heterogeneity detected over the sequences. However, the Markovian property of the original evolutionary process was not retained by the recoding, leading to some significant distortions of edge lengths in reconstructed trees.[Evolutionary processes; likelihood-ratio test; lumpability; Markovian processes; Markov models; phylogeny; recoding of nucleotides.].
在分子系统发育学中,通常假定DNA的进化过程可以通过可变位点处独立且同分布的马尔可夫过程来近似,并且这些过程在有根二叉树的边上发生分歧。有时,核苷酸会通过一种称为状态重新编码、合并或分组的程序从四状态字母表转换为三状态或二状态字母表。在此,我们针对在不同马尔可夫条件下发生分歧的DNA引入了一种关于可合并性的似然比检验,该检验评估了核苷酸重新编码后每条边上进化过程的马尔可夫性质是否得以保留这一假设。该检验通过数值方法在模拟数据上进行了推导和验证。为了展示使用该检验所能获得的见解,我们评估了两个已发表的数据集,一个是来自平胸总目系统发育研究的线粒体DNA数据集,另一个是来自酵母系统发育研究的核DNA数据集。我们对这些数据集的分析表明,DNA的重新编码消除了序列中检测到的一些组成异质性。然而,重新编码并未保留原始进化过程的马尔可夫性质,导致重建树中边长度出现一些显著扭曲。[进化过程;似然比检验;可合并性;马尔可夫过程;马尔可夫模型;系统发育;核苷酸重新编码。]