支长估计和分歧日期:贝叶斯和最大似然框架中的误差估计。

Branch length estimation and divergence dating: estimates of error in Bayesian and maximum likelihood frameworks.

机构信息

Department of Biology, Colorado State University, Fort Collins, CO 80523-1878, USA.

出版信息

BMC Evol Biol. 2010 Jan 11;10:5. doi: 10.1186/1471-2148-10-5.

Abstract

BACKGROUND

Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates.

RESULTS

The accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes.

CONCLUSIONS

Branch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are > or =1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.

摘要

背景

物种分歧日期的估计可以帮助我们理解从核苷酸替换到物种形成等各种过程。这些估计通常基于物种之间的分子遗传差异;因此,它们依赖于对这些差异数量(即每个位点的替换数,通过系统发育树上的分支长度来衡量)的准确估计。我们使用模拟来确定数据集大小、分支长度异质性、分支深度和分析框架对一系列分支长度的分支长度估计的影响。然后,我们重新分析了有袋类蝾螈的一个经验数据集,以确定分支长度估计不准确如何影响分歧日期的估计。

结果

分支长度估计的准确性随分支长度、数据集大小(分类单元和位点的数量)、分支长度异质性、分支深度、数据集复杂性和分析框架而变化。对于在贝叶斯框架中分析的简单系统发育,随着分支长度的增加,分支被逐渐低估;在最大似然框架中,较长的分支长度略有高估。在两个框架中,更长的数据集都改善了估计;但是,当增加分类单元的数量时,较深分支的估计准确性低于尖端分支。增加数据集的复杂性会在贝叶斯框架中产生更多的错误估计分支;然而,在 ML 框架中,更多的分支被更准确地估计。使用 ML 分支长度估计值重新估计有袋类蝾螈的分歧日期通常会导致较老节点的估计年龄增加,而较年轻节点的估计年龄减少。

结论

对于简单数据集的模拟,两个统计框架中的分支长度都被错误估计。然而,对于复杂数据集,ML 中的长度估计非常准确(即使对于短数据集),而在贝叶斯框架中,很少有分支被准确估计。我们对经验数据的重新分析表明,贝叶斯分支长度估计错误对分歧日期估计的影响程度。由于在 ML 框架中,当分支长度<1 替换/位点且数据集长度> =1 kb 时,经验数据集的分支长度可以最可靠地估计,因此我们建议,使用超出这些参数的数据集、分支长度和/或分析技术的分歧日期估计应谨慎解释。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索