Zhan Wentao, Datta Abhirup
Department of Biostatistics, Johns Hopkins University.
J Am Stat Assoc. 2025;120(549):535-547. doi: 10.1080/01621459.2024.2356293. Epub 2024 Jun 24.
Analysis of geospatial data has traditionally been model-based, with a mean model, customarily specified as a linear regression on the covariates, and a Gaussian process covariance model, encoding the spatial dependence. While nonlinear machine learning algorithms like neural networks are increasingly being used for spatial analysis, current approaches depart from the model-based setup and cannot explicitly incorporate spatial covariance. We propose , embedding neural networks directly within the traditional Gaussian process (GP) geostatistical model to accommodate non-linear mean functions while retaining all other advantages of GP, like explicit modeling of the spatial covariance and predicting at new locations via kriging. In NN-GLS, estimation of the neural network parameters for the non-linear mean of the Gaussian Process explicitly accounts for the spatial covariance through use of the generalized least squares (GLS) loss, thus extending the linear case. We show that NN-GLS admits a representation as a special type of graph neural network (GNN). This connection facilitates the use of standard neural network computational techniques for irregular geospatial data, enabling novel and scalable mini-batching, backpropagation, and kriging schemes. We provide methodology to obtain uncertainty bounds for estimation and predictions from NN-GLS. Theoretically, we show that NN-GLS will be consistent for irregularly observed spatially correlated data processes. We also provide a finite sample concentration rate, which quantifies the need to accurately model the spatial covariance in neural networks for dependent data. To our knowledge, these are the first large-sample results for any neural network algorithm for irregular spatial data. We demonstrate the methodology through numerous simulations and an application to air pollution modeling. We develop a software implementation of NN-GLS in the Python package geospaNN.
传统上,地理空间数据分析是基于模型的,包括一个均值模型(通常指定为协变量的线性回归)和一个高斯过程协方差模型,用于编码空间依赖性。虽然像神经网络这样的非线性机器学习算法越来越多地用于空间分析,但当前的方法偏离了基于模型的设置,无法明确纳入空间协方差。我们提出将神经网络直接嵌入传统的高斯过程(GP)地理统计模型中,以适应非线性均值函数,同时保留GP的所有其他优点,如空间协方差的显式建模以及通过克里金法在新位置进行预测。在NN-GLS中,通过使用广义最小二乘(GLS)损失,高斯过程非线性均值的神经网络参数估计明确考虑了空间协方差,从而扩展了线性情况。我们表明,NN-GLS可以表示为一种特殊类型的图神经网络(GNN)。这种联系便于将标准神经网络计算技术用于不规则地理空间数据,实现新颖且可扩展的小批量处理、反向传播和克里金法方案。我们提供了从NN-GLS进行估计和预测的不确定性边界的方法。从理论上讲,我们表明NN-GLS对于不规则观测的空间相关数据过程将是一致的。我们还提供了一个有限样本集中率,它量化了对相关数据的神经网络中空间协方差进行准确建模的必要性。据我们所知,这些是任何用于不规则空间数据的神经网络算法的首批大样本结果。我们通过大量模拟和空气污染建模应用展示了该方法。我们在Python包geospaNN中开发了NN-GLS的软件实现。