Ling Cheng, Hamada Tsuyoshi, Gao Jingyang, Zhao Guoguang, Sun Donghong, Shi Weifeng
IEEE/ACM Trans Comput Biol Bioinform. 2016 Sep-Oct;13(5):845-854. doi: 10.1109/TCBB.2015.2495202. Epub 2015 Oct 27.
MrBayes is a widespread phylogenetic inference tool harnessing empirical evolutionary models and Bayesian statistics. However, the computational cost on the likelihood estimation is very expensive, resulting in undesirably long execution time. Although a number of multi-threaded optimizations have been proposed to speed up MrBayes, there are bottlenecks that severely limit the GPU thread-level parallelism of likelihood estimations. This study proposes a high performance and resource-efficient method for GPU-oriented parallelization of likelihood estimations. Instead of having to rely on empirical programming, the proposed novel decomposition storage model implements high performance data transfers implicitly. In terms of performance improvement, a speedup factor of up to 178 can be achieved on the analysis of simulated datasets by four Tesla K40 cards. In comparison to the other publicly available GPU-oriented MrBayes, the tgMC++ method (proposed herein) outperforms the tgMC (v1.0), nMC (v2.1.1) and oMC (v1.00) methods by speedup factors of up to 1.6, 1.9 and 2.9, respectively. Moreover, tgMC++ supports more evolutionary models and gamma categories, which previous GPU-oriented methods fail to take into analysis.
MrBayes是一种广泛使用的系统发育推断工具,它利用经验进化模型和贝叶斯统计。然而,似然估计的计算成本非常高,导致执行时间长得令人难以接受。尽管已经提出了许多多线程优化方法来加速MrBayes,但仍存在瓶颈,严重限制了似然估计的GPU线程级并行性。本研究提出了一种面向GPU的似然估计并行化的高性能且资源高效的方法。所提出的新颖分解存储模型无需依赖经验编程即可隐式实现高性能数据传输。在性能提升方面,使用四张Tesla K40卡对模拟数据集进行分析时,加速因子可达178。与其他公开可用的面向GPU的MrBayes相比,本文提出的tgMC++方法在加速因子上分别比tgMC(v1.0)、nMC(v2.1.1)和oMC(v1.00)方法高出1.6、1.9和2.9。此外,tgMC++支持更多的进化模型和伽马类别,而之前面向GPU的方法未能对其进行分析。