Suppr超能文献

利用低成本微站数据和机器学习识别适合预测小时污染物浓度的模型。

Identifying a suitable model for predicting hourly pollutant concentrations by using low-cost microstation data and machine learning.

机构信息

Chinese Research Academy of Environmental Sciences, Beijing, China.

Higher Institute of Computer Modeling and Their Applications, Clermont Auvergne University, Clermont-Ferrand, France.

出版信息

Sci Rep. 2022 Nov 19;12(1):19949. doi: 10.1038/s41598-022-24470-5.

Abstract

Accurately predicting the concentration of PM (fine particles with a diameter of 2.5 μm or less) is essential for health risk assessment and formulation of air pollution control strategies. At present, there is also a large amount of air pollution data. How to efficiently mine its hidden features to obtain the future concentration of pollutants is very important for the prevention and control of air pollution. Therefore we build a pollutant prediction model based on Lightweight Gradient Boosting Model (LightGBM) shallow machine learning and Long Short-Term Memory (LSTM) neural network. Firstly, the PM pollutant concentration data of 34 air quality stations in Beijing and the data of 18 weather stations were matched in time and space to obtain an input data set. Subsequently, the input data set was cleaned and preprocessed, and the training set was obtained by methods such as input feature extraction, input factor normalization, and data outlier processing. The hourly PM concentration value prediction was achieved in accordance with experiments conducted with the hourly PM data of Beijing from January 1, 2018 to October 1, 2020. Ultimately, the optimal hourly series prediction results were obtained after model comparisons. Through the comparison of these two models, it is found that the RMSE predicted by LSTM model for each pollutant is nearly 50% lower than that of LightGBM, and is more consistent with the fitting curve between the actual observations. The exploration of the input step size of LSTM model found that the accuracy of 3-h input data was higher than that of 12-h input data. It can be used for the management and decision-making of environmental protection departments and the formulation of preventive measures for emergency pollution incidents.

摘要

准确预测 PM(直径为 2.5μm 或以下的细颗粒物)的浓度对于健康风险评估和制定空气污染控制策略至关重要。目前,也有大量的空气污染数据。如何有效地挖掘其隐藏特征以获取污染物的未来浓度,对于空气污染的预防和控制非常重要。因此,我们基于轻量化梯度提升模型(LightGBM)和长短时记忆(LSTM)神经网络构建了一个污染物预测模型。首先,将北京 34 个空气质量站的 PM 污染物浓度数据与 18 个气象站的数据在时间和空间上进行匹配,以获得输入数据集。随后,对输入数据集进行清洗和预处理,并通过输入特征提取、输入因子归一化和数据异常值处理等方法获得训练集。根据 2018 年 1 月 1 日至 2020 年 10 月 1 日北京每小时 PM 数据进行实验,实现了每小时 PM 浓度值的预测。最终,通过模型比较获得了最佳的每小时序列预测结果。通过这两种模型的比较,发现 LSTM 模型对每种污染物的 RMSE 预测值比 LightGBM 低近 50%,并且与实际观测值之间的拟合曲线更加一致。对 LSTM 模型输入步长的探索发现,3 小时输入数据的准确性高于 12 小时输入数据。它可用于环保部门的管理和决策,以及紧急污染事件预防措施的制定。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/892d/9675857/72d97df444a6/41598_2022_24470_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验