Mingas Grigorios, Bottolo Leonardo, Bouganis Christos-Savvas
Department of Electrical and Electronic Engineering, Imperial College London, London, SW7 2AZ, UK.
Department of Medical Genetics, University of Cambridge, Cambridge, CB2 0QQ, UK.
Int J Approx Reason. 2017 Apr;83:413-433. doi: 10.1016/j.ijar.2016.10.011.
Particle Markov Chain Monte Carlo (pMCMC) is a stochastic algorithm designed to generate samples from a probability distribution, when the density of the distribution does not admit a closed form expression. pMCMC is most commonly used to sample from the Bayesian posterior distribution in State-Space Models (SSMs), a class of probabilistic models used in numerous scientific applications. Nevertheless, this task is prohibitive when dealing with complex SSMs with massive data, due to the high computational cost of pMCMC and its poor performance when the posterior exhibits multi-modality. This paper aims to address both issues by: 1) Proposing a novel pMCMC algorithm (denoted ppMCMC), which uses multiple Markov chains (instead of the one used by pMCMC) to improve sampling efficiency for multi-modal posteriors, 2) Introducing custom, parallel hardware architectures, which are tailored for pMCMC and ppMCMC. The architectures are implemented on Field Programmable Gate Arrays (FPGAs), a type of hardware accelerator with massive parallelization capabilities. The new algorithm and the two FPGA architectures are evaluated using a large-scale case study from genetics. Results indicate that ppMCMC achieves 1.96x higher sampling efficiency than pMCMC when using sequential CPU implementations. The FPGA architecture of pMCMC is 12.1x and 10.1x faster than state-of-the-art, parallel CPU and GPU implementations of pMCMC and up to 53x more energy efficient; the FPGA architecture of ppMCMC increases these speedups to 34.9x and 41.8x respectively and is 173x more power efficient, bringing previously intractable SSM-based data analyses within reach.
粒子马尔可夫链蒙特卡罗(pMCMC)是一种随机算法,旨在当概率分布的密度没有封闭形式的表达式时,从该概率分布中生成样本。pMCMC最常用于从状态空间模型(SSM)中的贝叶斯后验分布进行采样,SSM是一类在众多科学应用中使用的概率模型。然而,在处理具有大量数据的复杂SSM时,由于pMCMC的计算成本高且在后验呈现多模态时性能不佳,这项任务变得非常困难。本文旨在通过以下方式解决这两个问题:1)提出一种新颖的pMCMC算法(表示为ppMCMC),它使用多个马尔可夫链(而不是pMCMC使用的单个马尔可夫链)来提高对多模态后验的采样效率;2)引入定制的并行硬件架构,这些架构是为pMCMC和ppMCMC量身定制的。这些架构在现场可编程门阵列(FPGA)上实现,FPGA是一种具有大规模并行化能力的硬件加速器。使用来自遗传学的大规模案例研究对新算法和两种FPGA架构进行了评估。结果表明,在使用顺序CPU实现时,ppMCMC的采样效率比pMCMC高1.96倍。pMCMC的FPGA架构比pMCMC的最新并行CPU和GPU实现快12.1倍和10.1倍,并且能源效率提高多达53倍;ppMCMC的FPGA架构分别将这些加速提高到34.9倍和41.8倍,并且能源效率提高173倍,使以前难以处理的基于SSM的数据分析变得可行。