Suppr超能文献

一种将子采样纳入通用贝叶斯层次模型的方法。

An Approach to Incorporate Subsampling into a Generic Bayesian Hierarchical Model.

作者信息

Bradley Jonathan R

机构信息

Department of Statistics, Florida State University, 117 N. Woodward Ave., Tallahassee, FL 32306-4330.

出版信息

J Comput Graph Stat. 2021;30(4):889-905. doi: 10.1080/10618600.2021.1923518. Epub 2021 Jun 21.

Abstract

The goal of this paper is to provide a way for Bayesian statisticians to incorporate subsampling directly into the Bayesian hierarchical model of their choosing without imposing additional restrictive model assumptions. We are motivated by the fact that the rise of "big data" has created difficulties for statisticians to directly apply their methods to big datasets. We introduce a "data subset model" to the popular "data model, process model, and parameter model" framework used to summarize Bayesian hierarchical models. The hyperparameters of the data subset model are specified constructively in that they are chosen such that the implied size of the subset satisfies pre-defined computational constraints. Thus, these hyperparameters effectively calibrate the statistical model to the computer itself to obtain predictions/estimations in a pre-specified amount of time. Several properties of the data subset model are provided including: propriety, partial sufficiency, and semi-parametric properties. Simulated datasets will be used to assess the consequences of subsampling, and results will be presented across different computers to show the effect of the computer on the statistical analysis. Additionally, we provide a joint analysis of a high-dimensional dataset (roughly 10 gigabytes) consisting of 2018 5-year period estimates from the US Census Bureau's Public Use Micro-Sample (PUMS).

摘要

本文的目标是为贝叶斯统计学家提供一种方法,使其能够在不施加额外严格模型假设的情况下,将子采样直接纳入其选择的贝叶斯层次模型。我们的动机源于这样一个事实:“大数据”的兴起给统计学家将其方法直接应用于大型数据集带来了困难。我们在用于总结贝叶斯层次模型的流行的“数据模型、过程模型和参数模型”框架中引入了一个“数据子集模型”。数据子集模型的超参数是通过构造性方式指定的,即它们的选择使得子集的隐含大小满足预定义的计算约束。因此,这些超参数有效地将统计模型校准到计算机本身,以便在预先指定的时间内获得预测/估计。文中给出了数据子集模型的几个性质,包括:恰当性、部分充分性和半参数性质。将使用模拟数据集来评估子采样的结果,并在不同计算机上展示结果,以显示计算机对统计分析的影响。此外,我们对一个高维数据集(约10GB)进行了联合分析,该数据集由美国人口普查局公共使用微观样本(PUMS)的2018个5年期估计值组成。

相似文献

4
Genomic prediction using subsampling.使用子采样的基因组预测。
BMC Bioinformatics. 2017 Mar 24;18(1):191. doi: 10.1186/s12859-017-1582-3.

本文引用的文献

1
A Case Study Competition Among Methods for Analyzing Large Spatial Data.大型空间数据分析方法的案例研究竞赛
J Agric Biol Environ Stat. 2019;24(3):398-425. doi: 10.1007/s13253-018-00348-w. Epub 2018 Dec 14.
4
Sparse Multivariate Regression With Covariance Estimation.带协方差估计的稀疏多元回归
J Comput Graph Stat. 2010 Fall;19(4):947-962. doi: 10.1198/jcgs.2010.09188.
5
Identifying clusters in Bayesian disease mapping.在贝叶斯疾病地图绘制中识别聚类。
Biostatistics. 2014 Jul;15(3):457-69. doi: 10.1093/biostatistics/kxu005. Epub 2014 Mar 11.
6
Sparse estimation of a covariance matrix.协方差矩阵的稀疏估计。
Biometrika. 2011 Dec;98(4):807-820. doi: 10.1093/biomet/asr054.
8
9
Gaussian predictive process models for large spatial data sets.用于大型空间数据集的高斯预测过程模型。
J R Stat Soc Series B Stat Methodol. 2008 Sep 1;70(4):825-848. doi: 10.1111/j.1467-9868.2008.00663.x.
10
Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。
Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验