Suppr超能文献

贝叶斯系统发育学中的长枝吸引偏差和不一致性。

Long-branch attraction bias and inconsistency in Bayesian phylogenetics.

机构信息

Center for Ecology and Evolutionary Biology, University of Oregon, Eugene, Oregon, United States of America.

出版信息

PLoS One. 2009 Dec 9;4(12):e7891. doi: 10.1371/journal.pone.0007891.

Abstract

Bayesian inference (BI) of phylogenetic relationships uses the same probabilistic models of evolution as its precursor maximum likelihood (ML), so BI has generally been assumed to share ML's desirable statistical properties, such as largely unbiased inference of topology given an accurate model and increasingly reliable inferences as the amount of data increases. Here we show that BI, unlike ML, is biased in favor of topologies that group long branches together, even when the true model and prior distributions of evolutionary parameters over a group of phylogenies are known. Using experimental simulation studies and numerical and mathematical analyses, we show that this bias becomes more severe as more data are analyzed, causing BI to infer an incorrect tree as the maximum a posteriori phylogeny with asymptotically high support as sequence length approaches infinity. BI's long branch attraction bias is relatively weak when the true model is simple but becomes pronounced when sequence sites evolve heterogeneously, even when this complexity is incorporated in the model. This bias--which is apparent under both controlled simulation conditions and in analyses of empirical sequence data--also makes BI less efficient and less robust to the use of an incorrect evolutionary model than ML. Surprisingly, BI's bias is caused by one of the method's stated advantages--that it incorporates uncertainty about branch lengths by integrating over a distribution of possible values instead of estimating them from the data, as ML does. Our findings suggest that trees inferred using BI should be interpreted with caution and that ML may be a more reliable framework for modern phylogenetic analysis.

摘要

贝叶斯推断(BI)的系统发育关系使用与最大似然法(ML)相同的进化概率模型,因此 BI 通常被认为具有 ML 的理想统计特性,例如在给定准确模型的情况下,拓扑结构的推断基本上没有偏差,并且随着数据量的增加,推断的可靠性越来越高。在这里,我们表明 BI 与 ML 不同,它偏向于将长枝聚在一起的拓扑结构,即使已知真实模型和进化参数在一组系统发育中的先验分布。通过实验模拟研究以及数值和数学分析,我们表明这种偏差随着分析的数据量的增加而变得更加严重,导致 BI 作为最大后验概率系统发育推断出错误的树,随着序列长度接近无穷大,支持率逐渐升高。当真实模型简单时,BI 的长枝吸引偏差相对较弱,但当序列位点异速进化时,偏差变得明显,即使在模型中包含了这种复杂性。这种偏差——在受控模拟条件下和对经验序列数据的分析中都很明显——也使得 BI 比 ML 效率更低,对使用错误进化模型的鲁棒性更差。令人惊讶的是,BI 的偏差是由该方法的一个优点引起的——它通过在可能的分支长度分布上进行积分来包含对分支长度的不确定性,而不像 ML 那样从数据中估计分支长度。我们的研究结果表明,使用 BI 推断的树应该谨慎解释,而 ML 可能是现代系统发育分析更可靠的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/037b/2785476/be6f15cf36dd/pone.0007891.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验