Department of Environmental Science and Engineering, Sichuan University, Chengdu, Sichuan 610065, China.
Department of Land, Air, and Water Resources, University of California, Davis, CA 95616, United States.
Sci Total Environ. 2022 Jun 25;827:154278. doi: 10.1016/j.scitotenv.2022.154278. Epub 2022 Mar 3.
Until recently, Northern China was one of the most SO polluted regions in the world. The lack of long-term and spatially resolved surface SO data hinders retrospective evaluation of relevant environmental policies and human health effects. This study aims to derive the spatiotemporal distribution of surface SO across Northern China during 2005-2019. As "concept drift" causes substantial estimation bias in back-extrapolation, we propose a new approach named the robust back-extrapolation via data augmentation approach (RBE-DA) to model the long-term surface SO. The results show that the population-weighted regional SO ([SO]) increased from 2005 to 2007 and decreased steadily afterwards. The [SO] decreased by 80.4% from 74.2 ± 28.4 μg/m in 2007 to 14.6 ± 4.8 μg/m in 2019. The predicted spatial distributions for each year show that the SO pollution was severe (more than 20 μg/m) in most areas of Northern China until 2017. By using model interpretation methods, we visually reveal the mechanism of estimation bias in the back-extrapolation. Specifically, the training data is severely imbalanced with respect to the satellite-retrieved SO column densities (i.e., it is short on high-value samples), so the benchmark model is unable to extrapolate the effects of this important predictor. This study provides long-term surface SO data for post hoc evaluation and human exposure assessment in Northern China, while demonstrating that the interpretable machine learning approach is critical for model diagnostics and refinement. Leveraging satellite retrievals, the RBE-DA approach can be applied worldwide to back-extrapolate various measures of air quality.
直到最近,中国北方还是世界上污染最严重的地区之一。由于缺乏长期和空间分辨率的地面 SO 数据,这阻碍了对相关环境政策和人类健康影响的回顾性评估。本研究旨在推导出 2005-2019 年期间中国北方的地面 SO 的时空分布。由于“概念漂移”会导致反推估计产生大量偏差,因此我们提出了一种新的方法,称为通过数据增强的稳健反推方法(RBE-DA),以对长期的地面 SO 进行建模。结果表明,人口加权区域 SO([SO])从 2005 年到 2007 年增加,之后稳步下降。从 2007 年的 74.2±28.4μg/m 下降到 2019 年的 14.6±4.8μg/m,下降了 80.4%。每年的预测空间分布表明,直到 2017 年,中国北方大部分地区的 SO 污染仍然很严重(超过 20μg/m)。通过使用模型解释方法,我们直观地揭示了反推中的估计偏差机制。具体来说,卫星反演的 SO 柱浓度训练数据严重不平衡(即低值样本数量较少),因此基准模型无法外推该重要预测因子的影响。本研究为中国北方的事后评估和人类暴露评估提供了长期的地面 SO 数据,同时表明可解释的机器学习方法对于模型诊断和改进至关重要。利用卫星检索,RBE-DA 方法可以在全球范围内应用于各种空气质量指标的反推。