Suppr超能文献

一种用于大数据贝叶斯分析的自助式梅塔ropolis-黑斯廷斯算法。

A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.

作者信息

Liang Faming, Kim Jinsu, Song Qifan

机构信息

Professor, Department of Biostatistics, University of Florida, Gainesville, FL 32611.

Enterprise Solution Specialist, Big Data Analysis Consulting Team, LG CNS, Seoul, Republic of Korea.

出版信息

Technometrics. 2016;58(3):604-318. doi: 10.1080/00401706.2016.1142905. Epub 2016 Jul 8.

Abstract

Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.

摘要

马尔可夫链蒙特卡罗(MCMC)方法已被证明是分析复杂结构数据的一种非常强大的工具。然而,其计算密集型的特性,通常每次迭代都需要大量的迭代次数和对整个数据集进行完整扫描,这使得它们无法用于大数据分析。在本文中,我们提出了所谓的自助梅特罗波利斯 - 黑斯廷斯(BMH)算法,该算法提供了一个通用框架,用于说明如何驾驭强大的MCMC方法以用于大数据分析;即通过对数似然的蒙特卡罗平均值来代替全数据对数似然,该对数似然是从多个自助样本并行计算得到的。BMH算法具有易于并行化的结构,并且避免了在迭代中对整个数据集进行重复扫描,因此对于大数据问题是可行的。与流行的分而组合方法相比,BMH通常效率更高,因为它可以渐近地将整个数据信息整合到一次模拟运行中。BMH算法非常灵活。与梅特罗波利斯 - 黑斯廷斯算法一样,它可以作为开发适用于大数据问题的高级MCMC算法的基本构建块。本文通过回火BMH算法对此进行了说明,该算法可以看作是并行回火和BMH算法的结合。BMH还可以分别与可逆跳跃MCMC和模拟退火相结合,用于模型选择和优化。

相似文献

2
A general construction for parallelizing Metropolis-Hastings algorithms.一种并行化 Metropolis-Hastings 算法的通用构造。
Proc Natl Acad Sci U S A. 2014 Dec 9;111(49):17408-13. doi: 10.1073/pnas.1408184111. Epub 2014 Nov 24.
4
Double-Parallel Monte Carlo for Bayesian Analysis of Big Data.用于大数据贝叶斯分析的双平行蒙特卡罗方法
Stat Comput. 2019 Jan;29(1):23-32. doi: 10.1007/s11222-017-9791-1. Epub 2017 Nov 27.
6
Searching for efficient Markov chain Monte Carlo proposal kernels.搜索高效的马尔可夫链蒙特卡罗提议核。
Proc Natl Acad Sci U S A. 2013 Nov 26;110(48):19307-12. doi: 10.1073/pnas.1311790110. Epub 2013 Nov 11.
8
Two-Stage Metropolis-Hastings for Tall Data.用于高维数据的两阶段 metropolis-Hastings 算法
J Classif. 2018 Apr;35(1):29-51. doi: 10.1007/s00357-018-9248-z. Epub 2018 Mar 16.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验