Department of Math & CS, Muhlenberg College, 2400 W Chew St, Allentown, PA, 18104, USA.
Computational Biology Program, Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, WA, 98109, USA.
Bull Math Biol. 2024 Aug 5;86(9):114. doi: 10.1007/s11538-024-01338-5.
Bayesian phylogenetic inference is powerful but computationally intensive. Researchers may find themselves with two phylogenetic posteriors on overlapping data sets and may wish to approximate a combined result without having to re-run potentially expensive Markov chains on the combined data set. This raises the question: given overlapping subsets of a set of taxa (e.g. species or virus samples), and given posterior distributions on phylogenetic tree topologies for each of these taxon sets, how can we optimize a probability distribution on phylogenetic tree topologies for the entire taxon set? In this paper we develop a variational approach to this problem and demonstrate its effectiveness. Specifically, we develop an algorithm to find a suitable support of the variational tree topology distribution on the entire taxon set, as well as a gradient-descent algorithm to minimize the divergence from the restrictions of the variational distribution to each of the given per-subset probability distributions, in an effort to approximate the posterior distribution on the entire taxon set.
贝叶斯系统发育推断功能强大,但计算量很大。研究人员可能会发现自己在重叠数据集上有两个系统发育后验,并且可能希望在不重新运行潜在昂贵的联合数据集上的马尔可夫链的情况下,近似联合结果。这就提出了一个问题:给定一组分类单元(例如物种或病毒样本)的重叠子集,并且给定每个分类单元集的系统发育树拓扑的后验分布,我们如何优化整个分类单元集的系统发育树拓扑的概率分布?在本文中,我们针对该问题开发了一种变分方法,并证明了其有效性。具体来说,我们开发了一种算法来找到整个分类单元集上变分树拓扑分布的合适支撑,以及一种梯度下降算法来最小化与每个给定子集中的概率分布的变分分布的限制的差异,以努力近似整个分类单元集的后验分布。