Rannala Bruce, Yang Ziheng
Department of Evolution and Ecology, University of California, Davis, CA 95616, USA.
Department of Genetics, Evolution and Environment, University College London, London WC1E 6BT, UK.
Syst Biol. 2017 Sep 1;66(5):823-842. doi: 10.1093/sysbio/syw119.
We develop a Bayesian method for inferring the species phylogeny under the multispecies coalescent (MSC) model. To improve the mixing properties of the Markov chain Monte Carlo (MCMC) algorithm that traverses the space of species trees, we implement two efficient MCMC proposals: the first is based on the Subtree Pruning and Regrafting (SPR) algorithm and the second is based on a node-slider algorithm. Like the Nearest-Neighbor Interchange (NNI) algorithm we implemented previously, both new algorithms propose changes to the species tree, while simultaneously altering the gene trees at multiple genetic loci to automatically avoid conflicts with the newly proposed species tree. The method integrates over gene trees, naturally taking account of the uncertainty of gene tree topology and branch lengths given the sequence data. A simulation study was performed to examine the statistical properties of the new method. The method was found to show excellent statistical performance, inferring the correct species tree with near certainty when 10 loci were included in the dataset. The prior on species trees has some impact, particularly for small numbers of loci. We analyzed several previously published datasets (both real and simulated) for rattlesnakes and Philippine shrews, in comparison with alternative methods. The results suggest that the Bayesian coalescent-based method is statistically more efficient than heuristic methods based on summary statistics, and that our implementation is computationally more efficient than alternative full-likelihood methods under the MSC. Parameter estimates for the rattlesnake data suggest drastically different evolutionary dynamics between the nuclear and mitochondrial loci, even though they support largely consistent species trees. We discuss the different challenges facing the marginal likelihood calculation and transmodel MCMC as alternative strategies for estimating posterior probabilities for species trees. [Bayes factor; Bayesian inference; MCMC; multispecies coalescent; nodeslider; species tree; SPR.].
我们开发了一种贝叶斯方法,用于在多物种溯祖(MSC)模型下推断物种系统发育。为了改善遍历物种树空间的马尔可夫链蒙特卡罗(MCMC)算法的混合特性,我们实现了两种高效的MCMC提议:第一种基于子树修剪和重新嫁接(SPR)算法,第二种基于节点滑动算法。与我们之前实现的最近邻交换(NNI)算法一样,这两种新算法都提议对物种树进行更改,同时在多个基因座处改变基因树,以自动避免与新提议的物种树产生冲突。该方法对基因树进行整合,自然地考虑了给定序列数据时基因树拓扑结构和分支长度的不确定性。我们进行了一项模拟研究,以检验新方法的统计特性。结果发现,当数据集中包含10个基因座时,该方法具有出色的统计性能,几乎可以确定地推断出正确的物种树。物种树的先验有一定影响,特别是对于少量基因座的情况。我们分析了几个先前发表的响尾蛇和菲律宾鼩鼱的数据集(包括真实的和模拟的),并与其他方法进行了比较。结果表明,基于贝叶斯溯祖的方法在统计上比基于摘要统计的启发式方法更有效,并且我们的实现比MSC下的其他全似然方法在计算上更高效。响尾蛇数据的参数估计表明,核基因座和线粒体基因座之间的进化动态差异很大,尽管它们支持的物种树在很大程度上是一致的。我们讨论了边际似然计算和跨模型MCMC作为估计物种树后验概率的替代策略所面临的不同挑战。[贝叶斯因子;贝叶斯推断;MCMC;多物种溯祖;节点滑动器;物种树;SPR。]