ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia.
Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA.
Syst Biol. 2018 May 1;67(3):490-502. doi: 10.1093/sysbio/syx090.
Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop "guided" proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.
现代传染病爆发监测产生了连续的序列数据,这些数据需要在数据到达时进行系统发育分析。目前用于贝叶斯系统发育推断的软件包无法快速整合新的序列,因为它们在可用时不太有用,无法动态展开进化故事。通过应用一类称为序列蒙特卡罗(SMC)的贝叶斯统计推断算法来进行在线推断,可以解决这一限制,其中可以不断整合新数据来更新后验概率分布的估计值。在本文中,我们描述并评估了几种不同的在线系统发育序列蒙特卡罗(OPSMC)算法。我们表明,使用与贝叶斯先验相似的密度提出新的系统发育会导致性能不佳,我们开发了“引导”提议,使提议密度更好地匹配后验。此外,我们表明,最简单的引导提议在某些情况下可能表现出病态行为,导致结果不佳,并且可以通过加热提议密度来解决这种情况。结果表明,与 MrBayes 中广泛使用的基于 MCMC 的算法相比,通过使用 OPSMC 可以显著减少计算一系列系统发育后验所需的总时间,而不会显著降低准确性。