Key Laboratory of Vertebrate Evolution and Human Origins, Institute of Vertebrate Paleontology and Paleoanthropology, Chinese Academy of Sciences, 142 XizhimenWai Street, Beijing 100044, China.
Center for Excellence in Life and Paleoenvironment, Chinese Academy of Sciences, 142 XizhimenWai Street, Beijing 100044, China.
Syst Biol. 2020 Sep 1;69(5):1016-1032. doi: 10.1093/sysbio/syaa002.
Sampling across tree space is one of the major challenges in Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) algorithms. Standard MCMC tree moves consider small random perturbations of the topology, and select from candidate trees at random or based on the distance between the old and new topologies. MCMC algorithms using such moves tend to get trapped in tree space, making them slow in finding the globally most probable trees (known as "convergence") and in estimating the correct proportions of the different types of them (known as "mixing"). Here, we introduce a new class of moves, which propose trees based on their parsimony scores. The proposal distribution derived from the parsimony scores is a quickly computable albeit rough approximation of the conditional posterior distribution over candidate trees. We demonstrate with simulations that parsimony-guided moves correctly sample the uniform distribution of topologies from the prior. We then evaluate their performance against standard moves using six challenging empirical data sets, for which we were able to obtain accurate reference estimates of the posterior using long MCMC runs, a mix of topology proposals, and Metropolis coupling. On these data sets, ranging in size from 357 to 934 taxa and from 1740 to 5681 sites, we find that single chains using parsimony-guided moves usually converge an order of magnitude faster than chains using standard moves. They also exhibit better mixing, that is, they cover the most probable trees more quickly. Our results show that tree moves based on quick and dirty estimates of the posterior probability can significantly outperform standard moves. Future research will have to show to what extent the performance of such moves can be improved further by finding better ways of approximating the posterior probability, taking the trade-off between accuracy and speed into account. [Bayesian phylogenetic inference; MCMC; parsimony; tree proposal.].
跨树空间采样是使用马尔可夫链蒙特卡罗 (MCMC) 算法进行贝叶斯系统发育推断的主要挑战之一。标准的 MCMC 树移动考虑拓扑的小随机扰动,并随机选择候选树或基于旧拓扑和新拓扑之间的距离进行选择。使用此类移动的 MCMC 算法往往会被困在树空间中,使得它们在找到全局最可能的树(称为“收敛”)和正确估计它们的不同类型的比例(称为“混合”)方面速度较慢。在这里,我们引入了一类新的移动,它们基于简约得分提出树。从简约得分导出的提议分布是候选树的条件后验分布的快速计算但粗糙的近似。我们通过模拟证明,简约引导的移动正确地从先验中对拓扑的均匀分布进行采样。然后,我们使用六个具有挑战性的经验数据集来评估它们与标准移动的性能,对于这些数据集,我们能够使用长 MCMC 运行、拓扑提案的混合和 Metropolis 耦合来获得后验的准确参考估计。在这些数据集上,大小从 357 到 934 个分类单元,从 1740 到 5681 个位点,我们发现使用简约引导移动的单个链通常比使用标准移动的链快一个数量级。它们还表现出更好的混合性,即它们更快地覆盖最可能的树。我们的结果表明,基于后验概率快速而粗略的估计的树移动可以显著优于标准移动。未来的研究将不得不展示通过找到更好的方法来近似后验概率,在准确性和速度之间进行权衡,这种移动的性能可以在多大程度上进一步提高。[贝叶斯系统发育推断;MCMC;简约;树提议。]。