Češnovar Rok, Štrumbelj Erik
Faculty of computer and information science, University of Ljubljana, Večna pot 113, 1000, Ljubljana, Slovenia.
PLoS One. 2017 Jun 28;12(6):e0180343. doi: 10.1371/journal.pone.0180343. eCollection 2017.
We describe an efficient Bayesian parallel GPU implementation of two classic statistical models-the Lasso and multinomial logistic regression. We focus on parallelizing the key components: matrix multiplication, matrix inversion, and sampling from the full conditionals. Our GPU implementations of Bayesian Lasso and multinomial logistic regression achieve 100-fold speedups on mid-level and high-end GPUs. Substantial speedups of 25 fold can also be achieved on older and lower end GPUs. Samplers are implemented in OpenCL and can be used on any type of GPU and other types of computational units, thereby being convenient and advantageous in practice compared to related work.
我们描述了两种经典统计模型——套索(Lasso)和多项逻辑回归的高效贝叶斯并行GPU实现。我们专注于对关键组件进行并行化处理:矩阵乘法、矩阵求逆以及从完全条件分布中采样。我们的贝叶斯套索和多项逻辑回归的GPU实现,在中级和高端GPU上实现了100倍的加速。在较旧的低端GPU上也能实现25倍的显著加速。采样器是用OpenCL实现的,可以在任何类型的GPU和其他类型的计算单元上使用,因此与相关工作相比,在实际应用中既方便又具有优势。