Department of Agricultural and Environmental Biology, University of Tokyo, Tokyo, Japan.
Mol Biol Evol. 2019 Apr 1;36(4):825-833. doi: 10.1093/molbev/msz020.
The pattern of molecular evolution varies among gene sites and genes in a genome. By taking into account the complex heterogeneity of evolutionary processes among sites in a genome, Bayesian infinite mixture models of genomic evolution enable robust phylogenetic inference. With large modern data sets, however, the computational burden of Markov chain Monte Carlo sampling techniques becomes prohibitive. Here, we have developed a variational Bayesian procedure to speed up the widely used PhyloBayes MPI program, which deals with the heterogeneity of amino acid profiles. Rather than sampling from the posterior distribution, the procedure approximates the (unknown) posterior distribution using a manageable distribution called the variational distribution. The parameters in the variational distribution are estimated by minimizing Kullback-Leibler divergence. To examine performance, we analyzed three empirical data sets consisting of mitochondrial, plastid-encoded, and nuclear proteins. Our variational method accurately approximated the Bayesian inference of phylogenetic tree, mixture proportions, and the amino acid propensity of each component of the mixture while using orders of magnitude less computational time.
基因位点和基因组中的基因的分子进化模式各不相同。通过考虑基因组中各位点进化过程的复杂异质性,贝叶斯无限混合模型能够实现稳健的系统发育推断。然而,随着现代大型数据集的出现,马尔可夫链蒙特卡罗采样技术的计算负担变得难以承受。在这里,我们开发了一种变分贝叶斯程序,以加快广泛使用的 PhyloBayes MPI 程序的速度,该程序处理氨基酸分布的异质性。该程序不是从后验分布中进行采样,而是使用称为变分分布的可管理分布来近似(未知)后验分布。通过最小化 Kullback-Leibler 散度来估计变分分布中的参数。为了检验性能,我们分析了三个包含线粒体、质体编码和核蛋白的实证数据集。我们的变分方法在使用数量级更少的计算时间的同时,准确地逼近了系统发育树、混合物比例以及混合物中每个成分的氨基酸倾向的贝叶斯推断。