Department of Environmental Science and Engineering, Fudan University, Shanghai, 200438, China.
Department of Atmospheric and Oceanic Sciences and Institute of Atmospheric Sciences, Fudan University, Shanghai, 200438, China; IRDR ICoE on Risk Interconnectivity and Governance on Weather/Climate Extremes Impact and Public Health, Fudan University, Shanghai, China.
Chemosphere. 2023 Jul;330:138742. doi: 10.1016/j.chemosphere.2023.138742. Epub 2023 Apr 19.
Estimating the effects of airborne particulate matter (PM) on climate and human health is highly dependent on the accurate prediction of its concentration and size distribution. High-complexity machine learning models have been widely used for PM concentration prediction, but such models are often considered as "black boxes", lacking interpretability. Here, a simple structure lightGBM model is built for ground PM estimation, and the SHAP approach is used to separate the meteorological contributions due to its strong influence on PM concentration. The models show good performance with correlation coefficient (R) of 0.84-0.88, 0.80-0.85, and 0.71-0.79, for PM, PM, and PM (2.5-10 μm), respectively. The lightGBM model trains 45 times faster than the XGBoost model while showing similar accuracy. More importantly, the models have small performance gaps between training and predicting (delta R: 0.07-0.12), effectively reducing overfitting risk. The PM datasets (10 km daily) of three size ranges are then generated over China from 2000 to 2020. The SHAP method shows good agreement with the meteorological normalization approach in separating the meteorological contributions (R > 0.5). In the Beijing-Tianjin-Hebei region (BTH), meteorology has greater influence on PM (-5.66%-9.99%) than PM and PM. In the Yangtze River Delta (YRD), and the Pearl River Delta (PRD), albedo has a large contribution to PM concentration under the influence of solar radiation. Notably, relative humidity (RH) has different seasonal effects on PM of three size ranges. In the BTH region, RH has negative effects on PM (-0.52 μg/m) and positive effects on PM (1.01 μg/m) and PM (3.39 μg/m) in spring, but has opposite effects in summer. The results of SHAP approach are consistent with existing conclusions and imply its feasibility in explaining haze formation. The generated PM datasets are useful in health assessment, environmental management, and climate change studies.
估算空气中颗粒物 (PM) 对气候和人类健康的影响高度依赖于其浓度和粒径分布的准确预测。高复杂度的机器学习模型已被广泛用于 PM 浓度预测,但这些模型通常被认为是“黑箱”,缺乏可解释性。在这里,我们构建了一个简单结构的 LightGBM 模型来估算地面 PM,并使用 SHAP 方法分离气象对 PM 浓度的影响,因为气象对 PM 浓度有很强的影响。模型在 PM、PM 和 PM (2.5-10μm) 方面的表现分别为 0.84-0.88、0.80-0.85 和 0.71-0.79,具有良好的相关性 (R)。LightGBM 模型的训练速度比 XGBoost 模型快 45 倍,而准确性相似。更重要的是,模型在训练和预测之间的性能差距较小 (delta R:0.07-0.12),有效地降低了过拟合风险。然后,我们从 2000 年到 2020 年在中国生成了三个大小范围的 PM 数据集 (10km 每日)。SHAP 方法在分离气象贡献方面与气象归一化方法具有良好的一致性 (R>0.5)。在北京-天津-河北地区 (BTH),气象对 PM (-5.66%-9.99%) 的影响大于 PM 和 PM。在长江三角洲 (YRD) 和珠江三角洲 (PRD),在太阳辐射的影响下,反照率对 PM 浓度有很大的贡献。值得注意的是,相对湿度 (RH) 对三个大小范围的 PM 具有不同的季节影响。在 BTH 地区,RH 在春季对 PM (-0.52μg/m) 有负面影响,对 PM (1.01μg/m) 和 PM (3.39μg/m) 有正影响,但在夏季有相反的影响。SHAP 方法的结果与现有结论一致,表明其在解释雾霾形成方面的可行性。生成的 PM 数据集在健康评估、环境管理和气候变化研究中很有用。