Peruzzi Michele, Banerjee Sudipto, Finley Andrew O
Department of Forestry, Michigan State University.
Department of Statistical Science, Duke University.
J Am Stat Assoc. 2022;117(538):969-982. doi: 10.1080/01621459.2020.1833889. Epub 2020 Nov 24.
We introduce a class of scalable Bayesian hierarchical models for the analysis of massive geostatistical datasets. The underlying idea combines ideas on high-dimensional geostatistics by partitioning the spatial domain and modeling the regions in the partition using a sparsity-inducing directed acyclic graph (DAG). We extend the model over the DAG to a well-defined spatial process, which we call the Meshed Gaussian Process (MGP). A major contribution is the development of a MGPs on tessellated domains, accompanied by a Gibbs sampler for the efficient recovery of spatial random effects. In particular, the cubic MGP (Q-MGP) can harness high-performance computing resources by executing all large-scale operations in parallel within the Gibbs sampler, improving mixing and computing time compared to sequential updating schemes. Unlike some existing models for large spatial data, a Q-MGP facilitates massive caching of expensive matrix operations, making it particularly apt in dealing with spatiotemporal remote-sensing data. We compare Q-MGPs with large synthetic and real world data against state-of-the-art methods. We also illustrate using Normalized Difference Vegetation Index (NDVI) data from the Serengeti park region to recover latent multivariate spatiotemporal random effects at millions of locations. The source code is available at github.com/mkln/meshgp.
我们引入了一类可扩展的贝叶斯分层模型,用于分析海量地理统计数据集。其基本思想是通过划分空间域并使用稀疏诱导有向无环图(DAG)对划分中的区域进行建模,将高维地理统计学的思想结合起来。我们将DAG上的模型扩展为一个定义明确的空间过程,我们称之为网格化高斯过程(MGP)。一个主要贡献是在细分域上开发了MGP,并伴随着一个吉布斯采样器,用于有效恢复空间随机效应。特别是,立方MGP(Q-MGP)可以通过在吉布斯采样器中并行执行所有大规模操作来利用高性能计算资源,与顺序更新方案相比,改善了混合效果和计算时间。与一些现有的大空间数据模型不同,Q-MGP便于对昂贵的矩阵运算进行大规模缓存,使其特别适合处理时空遥感数据。我们将Q-MGP与大型合成数据和真实世界数据与最先进的方法进行比较。我们还展示了使用塞伦盖蒂公园地区的归一化植被指数(NDVI)数据来恢复数百万个位置的潜在多元时空随机效应。源代码可在github.com/mkln/meshgp上获取。