Jun Seong-Hwan, Nasif Hassan, Jennings-Shaffer Chris, Rich David H, Kooperberg Anna, Fourment Mathieu, Zhang Cheng, Suchard Marc A, Matsen Frederick A
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, USA.
Department of Statistics, University of Washington, Seattle, USA.
Algorithms Mol Biol. 2023 Jul 31;18(1):10. doi: 10.1186/s13015-023-00235-1.
Bayesian phylogenetics is a computationally challenging inferential problem. Classical methods are based on random-walk Markov chain Monte Carlo (MCMC), where random proposals are made on the tree parameter and the continuous parameters simultaneously. Variational phylogenetics is a promising alternative to MCMC, in which one fits an approximating distribution to the unnormalized phylogenetic posterior. Previous work fit this variational approximation using stochastic gradient descent, which is the canonical way of fitting general variational approximations. However, phylogenetic trees are special structures, giving opportunities for efficient computation. In this paper we describe a new algorithm that directly generalizes the Felsenstein pruning algorithm (a.k.a. sum-product algorithm) to compute a composite-like likelihood by marginalizing out ancestral states and subtrees simultaneously. We show the utility of this algorithm by rapidly making point estimates for branch lengths of a multi-tree phylogenetic model. These estimates accord with a long MCMC run and with estimates obtained using a variational method, but are much faster to obtain. Thus, although generalized pruning does not lead to a variational algorithm as such, we believe that it will form a useful starting point for variational inference.
贝叶斯系统发育学是一个计算上具有挑战性的推理问题。经典方法基于随机游走马尔可夫链蒙特卡罗(MCMC),即在树参数和连续参数上同时进行随机提议。变分系统发育学是MCMC的一种有前景的替代方法,其中将一个近似分布拟合到未归一化的系统发育后验分布上。先前的工作使用随机梯度下降来拟合这种变分近似,这是拟合一般变分近似的标准方法。然而,系统发育树是特殊的结构,为高效计算提供了机会。在本文中,我们描述了一种新算法,该算法直接推广了费尔斯滕森剪枝算法(又称和积算法),通过同时边缘化祖先状态和子树来计算类似复合似然。我们通过快速对多树系统发育模型的分支长度进行点估计来展示该算法的实用性。这些估计与长时间的MCMC运行结果以及使用变分方法获得的估计结果一致,但获得速度要快得多。因此,尽管广义剪枝本身不会导致变分算法,但我们相信它将成为变分推理的一个有用起点。