Zhang Lu, Datta Abhirup, Banerjee Sudipto
Department of Biostatistics, University of California Los Angeles, California, USA.
Department of Biostatistics, Johns Hopkins University, Maryland, USA.
Stat Anal Data Min. 2019 Jun;12(3):197-209. doi: 10.1002/sam.11413. Epub 2019 Apr 23.
With continued advances in Geographic Information Systems and related computational technologies, statisticians are often required to analyze very large spatial datasets. This has generated substantial interest over the last decade, already too vast to be summarized here, in scalable methodologies for analyzing large spatial datasets. Scalable spatial process models have been found especially attractive due to their richness and flexibility and, particularly so in the Bayesian paradigm, due to their presence in hierarchical model settings. However, the vast majority of research articles present in this domain have been geared toward innovative theory or more complex model development. Very limited attention has been accorded to approaches for easily implementable scalable hierarchical models for the practicing scientist or spatial analyst. This article devises massively scalable Bayesian approaches that can rapidly deliver inference on spatial process that are practically indistinguishable from inference obtained using more expensive alternatives. A key emphasis is on implementation within very standard (modest) computing environments (e.g., a standard desktop or laptop) using easily available statistical software packages. Key insights are offered regarding assumptions and approximations concerning practical efficiency.
随着地理信息系统及相关计算技术的不断进步,统计学家常常需要分析非常大的空间数据集。在过去十年里,这引发了人们对分析大型空间数据集的可扩展方法的极大兴趣,其内容之丰富已无法在此一一概述。可扩展空间过程模型因其丰富性和灵活性而特别具有吸引力,尤其是在贝叶斯范式中,由于它们存在于分层模型设置中。然而,该领域目前的绝大多数研究文章都侧重于创新理论或更复杂的模型开发。对于为实践科学家或空间分析师提供易于实现的可扩展分层模型的方法,关注非常有限。本文设计了大规模可扩展的贝叶斯方法,这些方法能够快速得出关于空间过程的推断,且与使用更昂贵方法得出的推断几乎没有差别。重点在于使用易于获得的统计软件包,在非常标准(普通)的计算环境(例如标准台式机或笔记本电脑)中进行实现。文中还提供了关于实际效率的假设和近似方法的关键见解。