School of Life Sciences, University of Hawai'i, Honolulu, HI.
Mol Biol Evol. 2021 Apr 13;38(4):1627-1640. doi: 10.1093/molbev/msaa295.
Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.
几乎所有当前的贝叶斯系统发育应用都依赖于马尔可夫链蒙特卡罗 (MCMC) 方法来近似树和模型其他参数的后验分布。只有当马尔可夫链充分收敛并从联合后验分布中采样时,这些近似才是可靠的。尽管已经有一些关于系统发育 MCMC 收敛性的研究,但这些研究都集中在模拟数据集或选择的经验示例上。因此,在经验系统中被认为是 MCMC 常识的许多内容都源于在理想条件下进行的相对较小的分析家族。为了解决这个问题,我们概述了常用的系统发育 MCMC 诊断方法,并评估了这些诊断方法在超过 18000 个经验分析中的模式。许多分析似乎表现良好,并且最有可能通过比较独立链之间的拓扑结构的平均分歧频率标准偏差来检测到收敛失败的情况,这是一种诊断方法。不同的诊断方法提供了关于收敛失败的不同信息,这表明必须采用多种诊断方法才能可靠地检测到问题。分析中的分类单元数量和平均分支长度对 MCMC 性能有明显影响,分类单元数量越多,分支越短,收敛就越困难。我们表明,包含Γ分布的站点间速率变化和不变位点比例的模型的使用对于 MCMC 收敛性并不是广泛存在问题的,但也是不必要的。加热的改变和使用模型平均替代模型在某些情况下都可以提供更好的收敛性,但都不是万能的。