Baele Guy, Lemey Philippe, Suchard Marc A
Department of Microbiology and Immunology, Rega Institute, KU Leuven-University of Leuven, Leuven, Belgium
Department of Microbiology and Immunology, Rega Institute, KU Leuven-University of Leuven, Leuven, Belgium.
Syst Biol. 2016 Mar;65(2):250-64. doi: 10.1093/sysbio/syv083. Epub 2015 Nov 1.
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of "working distributions" to facilitate--or shorten--the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a "working" distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different "working" distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses.
使用贝叶斯因子比较模型的边际似然估计经常伴随着贝叶斯系统发育推断。在过去十年中,估计边际似然的方法受到了越来越多的关注。特别是,将路径抽样(PS)和 stepping - stone 抽样(SS)引入贝叶斯系统发育学极大地提高了模型选择的准确性。这些抽样技术现在被用于根据经验数据集评估复杂的进化和群体遗传模型,但巨大的计算需求阻碍了它们的广泛应用。此外,当为模型参数指定非常宽泛但恰当的先验时,数值问题会使先验的探索变得复杂,而这是使用 PS 或 SS 进行边际似然估计的必要步骤。为了避免这种不稳定性,最近提出了广义 SS(GSS),引入了“工作分布”的概念来促进——或缩短——作为边际似然估计基础的积分过程。然而,在基于溯祖理论的框架中,固定树拓扑结构的需求目前限制了 GSS 的应用。在这里,我们通过放宽固定的基础树拓扑结构假设来扩展 GSS。为此,我们在系谱空间上引入了一种“工作”分布,它能够在考虑系统发育不确定性的同时估计边际似然。我们提出了两种不同的“工作”分布,当比较应用于合成数据和实际例子的人口统计学和进化模型时,这有助于 GSS 在准确性方面优于 PS 和 SS。此外,我们表明,当使用 PS 和 SS 时,使用非常宽泛的先验可能会导致边际似然的显著高估,而使用两种 GSS 方法仍能得到正确的边际似然。本文中使用的方法可在 BEAST 中获得。BEAST 是一个功能强大且用户友好的软件包,用于进行贝叶斯进化分析。