Datta Abhirup, Banerjee Sudipto, Finley Andrew O, Gelfand Alan E
Department of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
Department of Biostatistics, University of California, Los Angeles, CA, USA.
Wiley Interdiscip Rev Comput Stat. 2016 Sep-Oct;8(5):162-171. doi: 10.1002/wics.1383. Epub 2016 Aug 4.
Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size.
高斯过程(GP)模型为对位置和时间索引数据集进行建模提供了一种非常灵活的非参数方法。然而,对于大型空间数据集而言,GP模型的存储和计算要求是不可行的。最近邻高斯过程(达塔A、班纳吉S、芬利AO、格尔芬德AE。用于大地统计数据集的分层最近邻高斯过程模型。2016年,《美国统计协会杂志》)通过使用来自少数最近邻的局部信息提供了一种可扩展的替代方法。可扩展性是通过在模型的条件规范中使用邻域集来实现的。我们展示了这如何等同于对大型协方差矩阵的乔列斯基因子进行稀疏建模。我们还讨论了一种使用稀疏局部克里金法构建可扩展高斯过程的通用方法。我们进行了一项多变量数据分析,展示了最近邻方法如何尽管速度快了几倍,但仍能产生与满秩GP难以区分的推断。最后,我们还提出了NNGP模型的一个变体,用于自动选择邻域集大小。