Suppr超能文献

基于可解释机器学习的日 PM 浓度季节性预测:以中国北京为例。

Seasonal prediction of daily PM concentrations with interpretable machine learning: a case study of Beijing, China.

机构信息

The State Key Laboratory of Molecular Vaccine and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, China.

National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361102, China.

出版信息

Environ Sci Pollut Res Int. 2022 Jun;29(30):45821-45836. doi: 10.1007/s11356-022-18913-9. Epub 2022 Feb 12.

Abstract

Machine learning (ML) has shown high predictive ability in environmental research. Accurate estimation of daily PM concentrations is a prerequisite to address environmental public health issues. However, studies on the interpretability of ML algorithms were limited. In this study, we aimed to estimate the daily concentrations of PM at a seasonal level, and to understand the potential mechanisms of ML algorithms' decisions with SHapley Additive exPlanations (SHAP). Daily ground PM concentrations and meteorological data were obtained from the Beijing Municipal Ecological and Environmental Monitoring Center, and China Meteorological Data Service Centre between December 2013 and 2019 November. We calculated correlation coefficient and variance inflation factor (VIF) to eliminate the variables with collinearity, and recursive feature elimination (RFE) was further used to selected more important predictors. A series of ML algorithms, including linear regression, the variants of linear regression (Ridge, Lasso, Elasticnet), decision tree (DT), k-nearest neighbor (KNN), support vector regression (SVR), ensemble methods (random forest: RF, eXtreme Gradient Boosting: XGBoost), and deep learning (long short-term memory network: LSTM), were developed to estimate seasonal-level daily PM concentrations. A 10-fold cross validation was used to tune hyperparameters, and root mean square error (RMSE), mean absolute error (MAE), ratio of performance to deviation (RPD), and Lin's concordance correlation coefficient (LCCC) were used to evaluate models' performance. SHAP was performed for local and global interpretability analysis. The results showed that the distribution of PM concentrations in Beijing showed obvious seasonal patterns. A total of five variables (Precipitation, Mean wind speed, Sunshine duration, Mean surface temperature, Mean relative humidity) were selected for final prediction. LSTM showed much higher accuracy than other traditional ML models, achieved the smallest RMSE of 19.58 µg/m and MAE of 15.11 µg/m. In terms of selected data set, there was acceptable (LCCC = 0.41 ~ 0.52) agreement and accuracy (RPD = 0.97 ~ 1.92) for LSTM. The SHAP analyses revealed that the meteorological factors had different influences in specific predictions, and the complex interactions were also illustrated. These results enhance our understanding of meteorological factors-PM relationships and explain the mechanisms of ML algorithms' decisions.

摘要

机器学习(ML)在环境研究中显示出了很高的预测能力。准确估计每日 PM 浓度是解决环境公共卫生问题的前提。然而,关于 ML 算法可解释性的研究有限。在这项研究中,我们旨在估算季节性的每日 PM 浓度,并通过 SHapley Additive exPlanations (SHAP) 来了解 ML 算法决策的潜在机制。我们从北京市生态环境监测中心和中国气象数据服务中心获取了 2013 年 12 月至 2019 年 11 月期间的每日地面 PM 浓度和气象数据。我们计算了相关系数和方差膨胀因子(VIF)以消除具有共线性的变量,并进一步使用递归特征消除(RFE)来选择更重要的预测因子。我们开发了一系列 ML 算法,包括线性回归、线性回归的变体(Ridge、Lasso、Elasticnet)、决策树(DT)、k-最近邻(KNN)、支持向量回归(SVR)、集成方法(随机森林:RF、极端梯度提升:XGBoost)和深度学习(长短期记忆网络:LSTM),以估算季节性的每日 PM 浓度。我们使用 10 折交叉验证来调整超参数,并使用均方根误差(RMSE)、平均绝对误差(MAE)、性能偏差比(RPD)和林的一致性相关系数(LCCC)来评估模型的性能。我们对局部和全局可解释性进行了 SHAP 分析。结果表明,北京的 PM 浓度分布呈现明显的季节性模式。最终选择了五个变量(降水量、平均风速、日照时间、平均地面温度、平均相对湿度)用于最终预测。LSTM 比其他传统 ML 模型具有更高的准确性,实现了最小的 RMSE(19.58µg/m)和 MAE(15.11µg/m)。在所选数据集方面,LSTM 具有可接受的(LCCC=0.410.52)一致性和准确性(RPD=0.971.92)。SHAP 分析揭示了气象因素在特定预测中的不同影响,并说明了复杂的相互作用。这些结果增强了我们对气象因素-PM 关系的理解,并解释了 ML 算法决策的机制。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验