Gaut B S, Lewis P O
Department of Statistics, North Carolina State University.
Mol Biol Evol. 1995 Jan;12(1):152-62. doi: 10.1093/oxfordjournals.molbev.a040183.
We used simulated data to investigate a number of properties of maximum-likelihood (ML) phylogenetic tree estimation for the case of four taxa. Simulated data were generated under a broad range of conditions, including wide variation in branch lengths, differences in the ratio of transition and transversion substitutions, and the absence of presence of gamma-distributed site-to-site rate variation. Data were analyzed in the ML framework with two different substitution models, and we compared the ability of the two models to reconstruct the correct topology. Although both models were inconsistent for some branch-length combinations in the presence of site-to-site variation, the models were efficient predictors of topology under most simulation conditions. We also examined the performance of the likelihood ratio (LR) test for significant positive interior branch length. This test was found to be misleading under many simulation conditions, rejecting too often under some simulation conditions. Under the null hypothesis of zero length internal branch, LR statistics are assumed to be asymptotically distributed chi 2(1); with limited data, the distribution of LR statistics under the null hypothesis varies from chi 2(1).
我们使用模拟数据来研究在四个分类单元情况下最大似然(ML)系统发育树估计的一些特性。模拟数据是在广泛的条件下生成的,包括分支长度的广泛变化、转换和颠换替代率的差异,以及是否存在伽马分布的位点间速率变化。使用两种不同的替代模型在ML框架中对数据进行分析,我们比较了这两种模型重建正确拓扑结构的能力。尽管在存在位点间变化的情况下,对于某些分支长度组合,两种模型都是不一致的,但在大多数模拟条件下,这些模型都是拓扑结构的有效预测器。我们还检验了似然比(LR)检验对显著正内部分支长度的性能。发现在许多模拟条件下,该检验具有误导性,在某些模拟条件下经常拒绝原假设。在内部分支长度为零的原假设下,LR统计量假定渐近分布为卡方(1);在数据有限的情况下,原假设下LR统计量的分布不同于卡方(1)。