Altekar Gautam, Dwarkadas Sandhya, Huelsenbeck John P, Ronquist Fredrik
Department of Computer Science, University of Rochester, USA.
Bioinformatics. 2004 Feb 12;20(3):407-15. doi: 10.1093/bioinformatics/btg427. Epub 2004 Jan 22.
Bayesian estimation of phylogeny is based on the posterior probability distribution of trees. Currently, the only numerical method that can effectively approximate posterior probabilities of trees is Markov chain Monte Carlo (MCMC). Standard implementations of MCMC can be prone to entrapment in local optima. Metropolis coupled MCMC [(MC)(3)], a variant of MCMC, allows multiple peaks in the landscape of trees to be more readily explored, but at the cost of increased execution time.
This paper presents a parallel algorithm for (MC)(3). The proposed parallel algorithm retains the ability to explore multiple peaks in the posterior distribution of trees while maintaining a fast execution time. The algorithm has been implemented using two popular parallel programming models: message passing and shared memory. Performance results indicate nearly linear speed improvement in both programming models for small and large data sets.
系统发育的贝叶斯估计基于树的后验概率分布。目前,唯一能够有效近似树的后验概率的数值方法是马尔可夫链蒙特卡罗(MCMC)。MCMC的标准实现可能容易陷入局部最优。MCMC的一种变体—— metropolis耦合MCMC [(MC)(3)],能够更轻松地探索树景观中的多个峰值,但代价是执行时间增加。
本文提出了一种用于(MC)(3)的并行算法。所提出的并行算法在保持快速执行时间的同时,保留了探索树后验分布中多个峰值的能力。该算法已使用两种流行的并行编程模型实现:消息传递和共享内存。性能结果表明,对于小数据集和大数据集,两种编程模型的速度提升几乎呈线性。