Gelfand Alan E, Sahu Sujit K, Holland David M
Institute of Statistics and Decisions, Duke University, Durham, NC, USA.
Environmetrics. 2012 Nov 1;23(7):565-578. doi: 10.1002/env.2169.
The choice of the sampling locations in a spatial network is often guided by practical demands. In particular, many locations are preferentially chosen to capture high values of a response, for example, air pollution levels in environmental monitoring. Then, model estimation and prediction of the exposure surface become biased due to the selective sampling. Since prediction is often the main utility of the modeling, we suggest that the effect of preferential sampling lies more importantly in the resulting predictive surface than in parameter estimation. Our contribution is to offer a direct simulation-based approach to assessing the effects of preferential sampling. We compare two predictive surfaces over the study region, one originating from the notion of an 'operating' intensity driving the selection of monitoring sites, the other under complete spatial randomness. We can consider a range of response models. They may reflect the operating intensity, introduce alternative informative covariates, or just propose a flexible spatial model. Then, we can generate data under the given model. Upon fitting the model and interpolating (kriging), we will obtain two predictive surfaces to compare. It is important to note that we need suitable metrics to compare the surfaces and that the predictive surfaces are random, so we need to make comparisons.
空间网络中采样位置的选择通常受实际需求的引导。特别是,许多位置被优先选择以捕捉响应的高值,例如环境监测中的空气污染水平。然后,由于选择性采样,暴露表面的模型估计和预测会产生偏差。由于预测通常是建模的主要用途,我们认为优先采样的影响更重要地体现在所得的预测表面上,而不是参数估计中。我们的贡献是提供一种基于直接模拟的方法来评估优先采样的影响。我们在研究区域比较两个预测表面,一个源自驱动监测站点选择的“操作”强度概念,另一个基于完全空间随机性。我们可以考虑一系列响应模型。它们可能反映操作强度,引入替代的信息协变量,或者只是提出一个灵活的空间模型。然后,我们可以在给定模型下生成数据。在拟合模型并进行插值(克里金法)后,我们将获得两个预测表面进行比较。需要注意的是,我们需要合适的指标来比较这些表面,并且预测表面是随机的,所以我们需要进行比较。