Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA.
Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA.
Environ Int. 2018 Dec;121(Pt 1):550-560. doi: 10.1016/j.envint.2018.09.051. Epub 2018 Oct 6.
Exposure to fine particulate matter (PM) has been associated with a wide range of negative health outcomes. The overwhelming majority of the epidemiological studies that helped establish such associations was conducted in regions with sufficient ground observations and other supporting data, i.e., the data-rich regions. However, air pollution health effects research in the data-poor regions, where pollution levels are often the highest, is still very limited due to the lack of high-quality exposure estimates. To improve our understanding of the desired input datasets for the application of satellite-based PM exposure models in data-poor areas, we applied a Bayesian ensemble model in the southeast U.S. that was selected as a representative data-rich region. We designed four groups of sensitivity tests to simulate various data-poor scenarios. The factors considered that would influence the model performance included the temporal sampling frequency of the monitors, the number of ground monitors, the accuracy of the chemical transport model simulation of PM concentrations, and different combinations of the additional predictors. While our full model achieved a 10-fold cross-validated (CV) R of 0.82, we found that when reducing the sampling frequency from the current 1-in-3 day to 1-in-9 day, the CV R decreased to 0.58, and the predictions could not capture the daily variations of PM. Half of the current stations (i.e., 30 monitors) could still support a robust model with a CV R of 0.79. With 20 monitors, the CV R decreased from 0.71 to 0.55 when 100% additional random errors were added to the original CMAQ simulations. However, with a sufficient number of ground monitors (e.g., 30 monitors), our Bayesian ensemble model had the ability to tolerate CMAQ errors with only a slight decrease in CV R (from 0.79 to 0.75). With fewer than 15 monitors, our full model collapsed and failed to fit any covariates, while the models with only time-varying variables could still converge even with only five monitors left. A model without the land use parameters lacked fine spatial details in the prediction maps, but could still capture the daily variability of PM (CV R ≥ 0.67) and might support a study of the acute health effects of PM exposure.
细颗粒物(PM)暴露与广泛的负面健康后果有关。绝大多数有助于确定这些关联的流行病学研究都是在地面观测和其他支持数据充足的地区进行的,即数据丰富的地区。然而,由于缺乏高质量的暴露估计值,空气污染健康影响研究在污染水平通常最高的数据匮乏地区仍然非常有限。为了更好地了解在数据匮乏地区应用卫星 PM 暴露模型所需的输入数据集,我们在美国东南部应用了一个贝叶斯集成模型,该模型被选为具有代表性的数据丰富地区。我们设计了四组敏感性测试来模拟各种数据匮乏的情况。考虑到会影响模型性能的因素包括监测器的时间采样频率、地面监测器的数量、PM 浓度化学传输模型模拟的准确性以及额外预测因子的不同组合。虽然我们的全模型在 10 倍交叉验证(CV)中的 R 值达到了 0.82,但我们发现,当将采样频率从目前的 1 天 3 次减少到 1 天 9 次时,CV R 下降到 0.58,并且预测无法捕捉 PM 的日变化。当前站点的一半(即 30 个监测器)仍然可以支持一个稳健的模型,CV R 为 0.79。当将 100%的额外随机误差添加到原始 CMAQ 模拟中时,使用 20 个监测器,CV R 从 0.71 下降到 0.55。然而,使用足够数量的地面监测器(例如 30 个监测器),我们的贝叶斯集成模型有能力容忍 CMAQ 错误,而 CV R 仅略有下降(从 0.79 降至 0.75)。当监测器少于 15 个时,我们的全模型崩溃,无法拟合任何协变量,而仅包含时变变量的模型即使只剩下 5 个监测器也仍能收敛。没有土地利用参数的模型在预测图中缺乏精细的空间细节,但仍能捕捉 PM 的日变化(CV R≥0.67),并且可能支持 PM 暴露急性健康影响的研究。