Suppr超能文献

许多实证比对中马尔可夫链蒙特卡罗性能的特性。

Properties of Markov Chain Monte Carlo Performance across Many Empirical Alignments.

机构信息

School of Life Sciences, University of Hawai'i, Honolulu, HI.

出版信息

Mol Biol Evol. 2021 Apr 13;38(4):1627-1640. doi: 10.1093/molbev/msaa295.

Abstract

Nearly all current Bayesian phylogenetic applications rely on Markov chain Monte Carlo (MCMC) methods to approximate the posterior distribution for trees and other parameters of the model. These approximations are only reliable if Markov chains adequately converge and sample from the joint posterior distribution. Although several studies of phylogenetic MCMC convergence exist, these have focused on simulated data sets or select empirical examples. Therefore, much that is considered common knowledge about MCMC in empirical systems derives from a relatively small family of analyses under ideal conditions. To address this, we present an overview of commonly applied phylogenetic MCMC diagnostics and an assessment of patterns of these diagnostics across more than 18,000 empirical analyses. Many analyses appeared to perform well and failures in convergence were most likely to be detected using the average standard deviation of split frequencies, a diagnostic that compares topologies among independent chains. Different diagnostics yielded different information about failed convergence, demonstrating that multiple diagnostics must be employed to reliably detect problems. The number of taxa and average branch lengths in analyses have clear impacts on MCMC performance, with more taxa and shorter branches leading to more difficult convergence. We show that the usage of models that include both Γ-distributed among-site rate variation and a proportion of invariable sites is not broadly problematic for MCMC convergence but is also unnecessary. Changes to heating and the usage of model-averaged substitution models can both offer improved convergence in some cases, but neither are a panacea.

摘要

几乎所有当前的贝叶斯系统发育应用都依赖于马尔可夫链蒙特卡罗 (MCMC) 方法来近似树和模型其他参数的后验分布。只有当马尔可夫链充分收敛并从联合后验分布中采样时,这些近似才是可靠的。尽管已经有一些关于系统发育 MCMC 收敛性的研究,但这些研究都集中在模拟数据集或选择的经验示例上。因此,在经验系统中被认为是 MCMC 常识的许多内容都源于在理想条件下进行的相对较小的分析家族。为了解决这个问题,我们概述了常用的系统发育 MCMC 诊断方法,并评估了这些诊断方法在超过 18000 个经验分析中的模式。许多分析似乎表现良好,并且最有可能通过比较独立链之间的拓扑结构的平均分歧频率标准偏差来检测到收敛失败的情况,这是一种诊断方法。不同的诊断方法提供了关于收敛失败的不同信息,这表明必须采用多种诊断方法才能可靠地检测到问题。分析中的分类单元数量和平均分支长度对 MCMC 性能有明显影响,分类单元数量越多,分支越短,收敛就越困难。我们表明,包含Γ分布的站点间速率变化和不变位点比例的模型的使用对于 MCMC 收敛性并不是广泛存在问题的,但也是不必要的。加热的改变和使用模型平均替代模型在某些情况下都可以提供更好的收敛性,但都不是万能的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc59/8042746/41967c5517fa/msaa295f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验