Boaz R M, Lawson A B, Pearce J L
Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA.
Environmetrics. 2019 Nov;30(7). doi: 10.1002/env.2592. Epub 2019 Jul 3.
Missing observations from air pollution monitoring networks have posed a longstanding problem for health investigators of air pollution. Growing interest in mixtures of air pollutants has further complicated this problem, as many new challenges have arisen that require development of novel methods. The objective of this study is to develop a methodology for multivariate prediction of air pollution. We focus specifically on tackling different forms of missing data, such as: spatial (sparse sites), outcome (pollutants not measured at some sites), and temporal (varieties of interrupted time series). To address these challenges, we develop a novel multivariate fusion framework, which leverages the observed inter-pollutant correlation structure to reduce error in the simultaneous prediction of multiple air pollutants. Our joint fusion model employs predictions from the Environmental Protection Agency's Community Multiscale Air Quality (CMAQ) model along with spatio-temporal error terms. We have implemented our models on both simulated data and a case study in South Carolina for 8 pollutants over a 28-day period in June 2006. We found that our model, which uses a multivariate correlated error in a Bayesian framework, showed promising predictive accuracy particularly for gaseous pollutants.
空气污染监测网络中观测数据的缺失,长期以来一直困扰着空气污染健康领域的研究人员。随着人们对空气污染物混合物的兴趣日益浓厚,这个问题变得更加复杂,因为出现了许多新挑战,需要开发新的方法。本研究的目的是开发一种空气污染多元预测方法。我们特别关注解决不同形式的缺失数据,例如:空间(监测站点稀疏)、结果(某些站点未测量的污染物)和时间(各种中断的时间序列)。为应对这些挑战,我们开发了一种新颖的多元融合框架,该框架利用观测到的污染物间相关结构,以减少多种空气污染物同时预测时的误差。我们的联合融合模型采用了美国环境保护局社区多尺度空气质量(CMAQ)模型的预测结果以及时空误差项。我们已将模型应用于模拟数据以及2006年6月在南卡罗来纳州进行的为期28天的8种污染物案例研究。我们发现,我们在贝叶斯框架中使用多元相关误差的模型,显示出了良好的预测准确性,尤其是对于气态污染物。