Suppr超能文献

公平平衡悖论、星树悖论与贝叶斯系统发育学

Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics.

作者信息

Yang Ziheng

机构信息

Department of Biology, Galton Laboratory, University College London, London, UK.

出版信息

Mol Biol Evol. 2007 Aug;24(8):1639-55. doi: 10.1093/molbev/msm081. Epub 2007 May 7.

Abstract

The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious posterior probabilities for trees or clades produced by the Bayesian method of phylogenetic reconstruction, and it is perceived to be a manifestation of the deeper problem of the extreme sensitivity of Bayesian model selection to the prior on parameters. Analysis of the star-tree paradox has been hampered by the intractability of the integrals involved. In this article, I use Laplacian expansion to approximate the posterior probabilities for the three rooted trees for three species using binary characters evolving at a constant rate. The approximation enables calculation of posterior tree probabilities for arbitrarily large data sets. Both theoretical analysis of the analogous fair-coin and fair-balance problems and computer simulation for the tree problem confirmed the existence of the star-tree paradox. When the data size n --> infinity, the posterior tree probabilities do not converge to 1/3 each, but they vary among data sets according to a statistical distribution. This distribution is characterized. Two strategies for resolving the star-tree paradox are explored: (1) a nonzero prior probability for the degenerate star tree and (2) an increasingly informative prior forcing the internal branch length toward zero. Both appear to be effective in resolving the paradox, but the latter is simpler to implement. The posterior tree probabilities are found to be very sensitive to the prior.

摘要

星树悖论指的是这样一种推测

当使用星树生成数据且数据量趋于无穷大时,四种物种的三种无根树(或者如果假设分子钟,则三种物种的三种有根树)的后验概率不会趋近于1/3。它反映了系统发育重建的贝叶斯方法所产生的树或进化枝的后验概率较高且可能是虚假的这一更为普遍的现象,并且被认为是贝叶斯模型选择对参数先验极度敏感这一更深层次问题的一种表现。对星树悖论的分析一直受到所涉及积分难以处理的阻碍。在本文中,我使用拉普拉斯展开来近似三种物种的三种有根树在以恒定速率进化的二元性状下的后验概率。这种近似使得能够计算任意大数据集的后验树概率。对类似的公平硬币和公平天平问题的理论分析以及对树问题的计算机模拟都证实了星树悖论的存在。当数据量(n)趋于无穷大时,后验树概率并非各自收敛到1/3,而是根据统计分布在不同数据集之间变化。对这种分布进行了特征描述。探索了两种解决星树悖论的策略:(1)对退化星树赋予非零先验概率;(2)采用信息量越来越大的先验,迫使内部分支长度趋近于零。两者似乎都能有效解决悖论,但后者更易于实施。发现后验树概率对先验非常敏感。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验