• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于可解释机器学习的日 PM 浓度季节性预测:以中国北京为例。

Seasonal prediction of daily PM concentrations with interpretable machine learning: a case study of Beijing, China.

机构信息

The State Key Laboratory of Molecular Vaccine and Molecular Diagnostics, School of Public Health, Xiamen University, Xiamen, 361102, China.

National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361102, China.

出版信息

Environ Sci Pollut Res Int. 2022 Jun;29(30):45821-45836. doi: 10.1007/s11356-022-18913-9. Epub 2022 Feb 12.

DOI:10.1007/s11356-022-18913-9
PMID:35150424
Abstract

Machine learning (ML) has shown high predictive ability in environmental research. Accurate estimation of daily PM concentrations is a prerequisite to address environmental public health issues. However, studies on the interpretability of ML algorithms were limited. In this study, we aimed to estimate the daily concentrations of PM at a seasonal level, and to understand the potential mechanisms of ML algorithms' decisions with SHapley Additive exPlanations (SHAP). Daily ground PM concentrations and meteorological data were obtained from the Beijing Municipal Ecological and Environmental Monitoring Center, and China Meteorological Data Service Centre between December 2013 and 2019 November. We calculated correlation coefficient and variance inflation factor (VIF) to eliminate the variables with collinearity, and recursive feature elimination (RFE) was further used to selected more important predictors. A series of ML algorithms, including linear regression, the variants of linear regression (Ridge, Lasso, Elasticnet), decision tree (DT), k-nearest neighbor (KNN), support vector regression (SVR), ensemble methods (random forest: RF, eXtreme Gradient Boosting: XGBoost), and deep learning (long short-term memory network: LSTM), were developed to estimate seasonal-level daily PM concentrations. A 10-fold cross validation was used to tune hyperparameters, and root mean square error (RMSE), mean absolute error (MAE), ratio of performance to deviation (RPD), and Lin's concordance correlation coefficient (LCCC) were used to evaluate models' performance. SHAP was performed for local and global interpretability analysis. The results showed that the distribution of PM concentrations in Beijing showed obvious seasonal patterns. A total of five variables (Precipitation, Mean wind speed, Sunshine duration, Mean surface temperature, Mean relative humidity) were selected for final prediction. LSTM showed much higher accuracy than other traditional ML models, achieved the smallest RMSE of 19.58 µg/m and MAE of 15.11 µg/m. In terms of selected data set, there was acceptable (LCCC = 0.41 ~ 0.52) agreement and accuracy (RPD = 0.97 ~ 1.92) for LSTM. The SHAP analyses revealed that the meteorological factors had different influences in specific predictions, and the complex interactions were also illustrated. These results enhance our understanding of meteorological factors-PM relationships and explain the mechanisms of ML algorithms' decisions.

摘要

机器学习(ML)在环境研究中显示出了很高的预测能力。准确估计每日 PM 浓度是解决环境公共卫生问题的前提。然而,关于 ML 算法可解释性的研究有限。在这项研究中,我们旨在估算季节性的每日 PM 浓度,并通过 SHapley Additive exPlanations (SHAP) 来了解 ML 算法决策的潜在机制。我们从北京市生态环境监测中心和中国气象数据服务中心获取了 2013 年 12 月至 2019 年 11 月期间的每日地面 PM 浓度和气象数据。我们计算了相关系数和方差膨胀因子(VIF)以消除具有共线性的变量,并进一步使用递归特征消除(RFE)来选择更重要的预测因子。我们开发了一系列 ML 算法,包括线性回归、线性回归的变体(Ridge、Lasso、Elasticnet)、决策树(DT)、k-最近邻(KNN)、支持向量回归(SVR)、集成方法(随机森林:RF、极端梯度提升:XGBoost)和深度学习(长短期记忆网络:LSTM),以估算季节性的每日 PM 浓度。我们使用 10 折交叉验证来调整超参数,并使用均方根误差(RMSE)、平均绝对误差(MAE)、性能偏差比(RPD)和林的一致性相关系数(LCCC)来评估模型的性能。我们对局部和全局可解释性进行了 SHAP 分析。结果表明,北京的 PM 浓度分布呈现明显的季节性模式。最终选择了五个变量(降水量、平均风速、日照时间、平均地面温度、平均相对湿度)用于最终预测。LSTM 比其他传统 ML 模型具有更高的准确性,实现了最小的 RMSE(19.58µg/m)和 MAE(15.11µg/m)。在所选数据集方面,LSTM 具有可接受的(LCCC=0.410.52)一致性和准确性(RPD=0.971.92)。SHAP 分析揭示了气象因素在特定预测中的不同影响,并说明了复杂的相互作用。这些结果增强了我们对气象因素-PM 关系的理解,并解释了 ML 算法决策的机制。

相似文献

1
Seasonal prediction of daily PM concentrations with interpretable machine learning: a case study of Beijing, China.基于可解释机器学习的日 PM 浓度季节性预测:以中国北京为例。
Environ Sci Pollut Res Int. 2022 Jun;29(30):45821-45836. doi: 10.1007/s11356-022-18913-9. Epub 2022 Feb 12.
2
Estimating particulate matter concentrations and meteorological contributions in China during 2000-2020.估算 2000-2020 年期间中国的颗粒物浓度和气象贡献。
Chemosphere. 2023 Jul;330:138742. doi: 10.1016/j.chemosphere.2023.138742. Epub 2023 Apr 19.
3
A machine learning method to estimate PM concentrations across China with remote sensing, meteorological and land use information.一种利用遥感、气象和土地利用信息估算中国 PM 浓度的机器学习方法。
Sci Total Environ. 2018 Sep 15;636:52-60. doi: 10.1016/j.scitotenv.2018.04.251. Epub 2018 Apr 25.
4
Deep Ensemble Machine Learning Framework for the Estimation of Concentrations.深度集成机器学习框架用于估算浓度。
Environ Health Perspect. 2022 Mar;130(3):37004. doi: 10.1289/EHP9752. Epub 2022 Mar 7.
5
Construction of a virtual PM observation network in China based on high-density surface meteorological observations using the Extreme Gradient Boosting model.基于极端梯度提升模型利用高密度地面气象观测资料构建中国虚拟 PM 观测网络。
Environ Int. 2020 Aug;141:105801. doi: 10.1016/j.envint.2020.105801. Epub 2020 May 29.
6
Enhanced PM2.5 estimation across China: An AOD-independent two-stage approach incorporating improved spatiotemporal heterogeneity representations.提升中国的 PM2.5 估算精度:一种结合改进时空异质性表达的 AOD 独立两阶段方法。
J Environ Manage. 2024 Sep;368:122107. doi: 10.1016/j.jenvman.2024.122107. Epub 2024 Aug 9.
7
Development of a stacked ensemble model for forecasting and analyzing daily average PM concentrations in Beijing, China.建立一个堆叠集成模型,用于预测和分析中国北京的日平均 PM 浓度。
Sci Total Environ. 2018 Sep 1;635:644-658. doi: 10.1016/j.scitotenv.2018.04.040. Epub 2018 Apr 24.
8
Predicting ground-level PM concentrations in the Beijing-Tianjin-Hebei region: A hybrid remote sensing and machine learning approach.预测京津冀地区的地面 PM 浓度:一种混合遥感和机器学习方法。
Environ Pollut. 2019 Jun;249:735-749. doi: 10.1016/j.envpol.2019.03.068. Epub 2019 Mar 22.
9
A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda.乌干达使用机器学习和本地开发的低成本颗粒物传感器的土地利用回归模型。
Environ Res. 2021 Aug;199:111352. doi: 10.1016/j.envres.2021.111352. Epub 2021 May 24.
10
Contributions of various driving factors to air pollution events: Interpretability analysis from Machine learning perspective.各种驱动因素对空气污染事件的贡献:基于机器学习视角的可解释性分析
Environ Int. 2023 Mar;173:107861. doi: 10.1016/j.envint.2023.107861. Epub 2023 Mar 4.

引用本文的文献

1
Utility of low-cost sensor measurement for predicting ambient PM concentrations: evidence from a monitoring network in Accra, Ghana.低成本传感器测量对预测环境空气中细颗粒物(PM)浓度的效用:来自加纳阿克拉一个监测网络的证据。
Environ Sci Atmos. 2025 Apr 1;5(4):517-529. doi: 10.1039/d4ea00140k. Epub 2025 Mar 10.
2
Predictive modelling of air pollution affecting human tuberculosis risk on Mainland China.中国大陆空气污染对人类结核病风险影响的预测模型
Sci Rep. 2025 Jul 2;15(1):23633. doi: 10.1038/s41598-025-08078-z.
3
Country-specific determinants for COVID-19 case fatality rate and response strategies from a global perspective: an interpretable machine learning framework.
从全球视角看 COVID-19 病死率的国家特有决定因素和应对策略:一个可解释的机器学习框架。
Popul Health Metr. 2024 Jun 3;22(1):10. doi: 10.1186/s12963-024-00330-4.
4
The role of booster vaccination in decreasing COVID-19 age-adjusted case fatality rate: Evidence from 32 countries.加强针接种在降低 COVID-19 年龄调整病死率中的作用:来自 32 个国家的证据。
Front Public Health. 2023 Apr 18;11:1150095. doi: 10.3389/fpubh.2023.1150095. eCollection 2023.