Section of Integrative Biology, University of Texas at Austin, 1 University Station C0930, Austin, TX 78712, USA.
Syst Biol. 2009 Feb;58(1):130-45. doi: 10.1093/sysbio/syp017. Epub 2009 May 22.
Although an increasing number of phylogenetic data sets are incomplete, the effect of ambiguous data on phylogenetic accuracy is not well understood. We use 4-taxon simulations to study the effects of ambiguous data (i.e., missing characters or gaps) in maximum likelihood (ML) and Bayesian frameworks. By introducing ambiguous data in a way that removes confounding factors, we provide the first clear understanding of 1 mechanism by which ambiguous data can mislead phylogenetic analyses. We find that in both ML and Bayesian frameworks, among-site rate variation can interact with ambiguous data to produce misleading estimates of topology and branch lengths. Furthermore, within a Bayesian framework, priors on branch lengths and rate heterogeneity parameters can exacerbate the effects of ambiguous data, resulting in strongly misleading bipartition posterior probabilities. The magnitude and direction of the ambiguous data bias are a function of the number and taxonomic distribution of ambiguous characters, the strength of topological support, and whether or not the model is correctly specified. The results of this study have major implications for all analyses that rely on accurate estimates of topology or branch lengths, including divergence time estimation, ancestral state reconstruction, tree-dependent comparative methods, rate variation analysis, phylogenetic hypothesis testing, and phylogeographic analysis.
尽管越来越多的系统发育数据集是不完整的,但模糊数据对系统发育准确性的影响还没有得到很好的理解。我们使用四分类模拟来研究最大似然法(ML)和贝叶斯框架中模糊数据(即缺失字符或空位)的影响。通过以一种消除混杂因素的方式引入模糊数据,我们首次清楚地了解了模糊数据可能误导系统发育分析的一种机制。我们发现,在 ML 和贝叶斯框架中,种间速率变化可以与模糊数据相互作用,从而产生拓扑结构和分支长度的误导性估计。此外,在贝叶斯框架内,分支长度和速率异质性参数的先验概率可以加剧模糊数据的影响,导致强烈误导的二分体后验概率。模糊数据偏差的幅度和方向是模糊字符的数量和分类分布、拓扑结构支持的强度以及模型是否正确指定的函数。本研究的结果对所有依赖于拓扑结构或分支长度的准确估计的分析都有重大影响,包括分歧时间估计、祖先状态重建、基于树的比较方法、速率变化分析、系统发育假设检验和系统地理学分析。
Mol Biol Evol. 2010-7-8
Mol Biol Evol. 2012-8-28
Bioinform Biol Insights. 2025-3-12
Microb Genom. 2025-1
Appl Plant Sci. 2024-8-9
Methods Mol Biol. 2024
Evolution. 1993-8
BMC Bioinformatics. 2007-8-28
Mol Phylogenet Evol. 2007-10
Mol Biol Evol. 2007-8
Mol Biol Evol. 2007-4
Trends Ecol Evol. 2007-1
Mol Biol Evol. 2006-10
BMC Evol Biol. 2005-10-6