Yang Ziheng, Rannala Bruce
Department of Biology, University College London, Darwin Building, Gower Street London WC1E 6BT, United Kingdom.
Syst Biol. 2005 Jun;54(3):455-70. doi: 10.1080/10635150590945313.
The Bayesian method for estimating species phylogenies from molecular sequence data provides an attractive alternative to maximum likelihood with nonparametric bootstrap due to the easy interpretation of posterior probabilities for trees and to availability of efficient computational algorithms. However, for many data sets it produces extremely high posterior probabilities, sometimes for apparently incorrect clades. Here we use both computer simulation and empirical data analysis to examine the effect of the prior model for internal branch lengths. We found that posterior probabilities for trees and clades are sensitive to the prior for internal branch lengths, and priors assuming long internal branches cause high posterior probabilities for trees. In particular, uniform priors with high upper bounds bias Bayesian clade probabilities in favor of extreme values. We discuss possible remedies to the problem, including empirical and full Bayesian methods and subjective procedures suggested in Bayesian hypothesis testing. Our results also suggest that the bootstrap proportion and Bayesian posterior probability are different measures of accuracy, and that the bootstrap proportion, if interpreted as the probability that the clade is true, can be either too liberal or too conservative.
从分子序列数据估计物种系统发育的贝叶斯方法,由于树的后验概率易于解释且有高效的计算算法,为具有非参数自展法的最大似然法提供了一个有吸引力的替代方法。然而,对于许多数据集,它会产生极高的后验概率,有时对于明显错误的分支也是如此。在这里,我们使用计算机模拟和实证数据分析来研究内部枝长先验模型的影响。我们发现,树和分支的后验概率对内部枝长的先验敏感,并且假设内部枝长较长的先验会导致树的后验概率较高。特别是,具有高上限的均匀先验会使贝叶斯分支概率偏向于极端值。我们讨论了该问题的可能补救措施,包括实证和完全贝叶斯方法以及贝叶斯假设检验中建议的主观程序。我们的结果还表明,自展比例和贝叶斯后验概率是不同的准确性度量,并且如果将自展比例解释为分支为真的概率,它可能过于宽松或过于保守。