Department of Statistics, Columbia University, New York.
Department of Biostatistics, University of California, Los Angeles, California.
Biometrics. 2022 Jun;78(2):560-573. doi: 10.1111/biom.13452. Epub 2021 Mar 23.
Multivariate spatially oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for each variable and associations among the different dependent variables. Multivariate latent spatial process models have proved effective in driving statistical inference and rendering better predictive inference at arbitrary locations for the spatial process. High-dimensional multivariate spatial data, which are the theme of this article, refer to data sets where the number of spatial locations and the number of spatially dependent variables is very large. The field has witnessed substantial developments in scalable models for univariate spatial processes, but such methods for multivariate spatial processes, especially when the number of outcomes are moderately large, are limited in comparison. Here, we extend scalable modeling strategies for a single process to multivariate processes. We pursue Bayesian inference, which is attractive for full uncertainty quantification of the latent spatial process. Our approach exploits distribution theory for the matrix-normal distribution, which we use to construct scalable versions of a hierarchical linear model of coregionalization (LMC) and spatial factor models that deliver inference over a high-dimensional parameter space including the latent spatial process. We illustrate the computational and inferential benefits of our algorithms over competing methods using simulation studies and an analysis of a massive vegetation index data set.
多元空间定向数据集在环境和物理科学中很常见。科学家们试图联合建模多个变量,每个变量都由一个空间位置索引,以捕捉每个变量的任何潜在空间关联以及不同因变量之间的关联。多元潜在空间过程模型已被证明在驱动统计推断和在任意位置对空间过程进行更好的预测推断方面非常有效。高维多元空间数据是本文的主题,是指空间位置数量和空间相关变量数量非常大的数据集。该领域在单变量空间过程的可扩展模型方面取得了重大进展,但与单变量空间过程相比,这种多元空间过程的方法(尤其是当结果数量适中时)受到限制。在这里,我们将单一过程的可扩展建模策略扩展到多元过程。我们追求贝叶斯推理,这对于潜在空间过程的完整不确定性量化很有吸引力。我们的方法利用了矩阵正态分布的分布理论,我们使用该理论构建了核心区化(LMC)和空间因子模型的层次线性模型的可扩展版本,这些模型可以在包括潜在空间过程在内的高维参数空间中进行推断。我们使用模拟研究和对大规模植被指数数据集的分析来说明我们的算法相对于竞争方法在计算和推断方面的优势。