Suppr
超能文献

设计用于 PM 空气污染预测的 Spark 大数据框架。

Design of a Spark Big Data Framework for PM Air Pollution Forecasting.

机构信息

Department of Information Management, National Yunlin University of Science & Technology, Douliu 64002, Taiwan.

Faculty of Environment, University of Science, 227 Nguyen Van Cu Street, District 5, Ho Chi Minh City 700000, Vietnam.

出版信息

Int J Environ Res Public Health. 2021 Jul 2;18(13):7087. doi: 10.3390/ijerph18137087.

DOI:10.3390/ijerph18137087

PMID:34281023

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8296958/

Abstract

In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM is one of the main components of air pollution. Therefore, it is necessary to know the PM air quality in advance for health. Many studies on air quality are based on the government's official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM forecasts and help the decision-maker to take proper action immediately.

摘要

近年来，随着经济的快速发展，空气污染变得极其严重，对健康、环境和医疗成本造成了许多负面影响。PM 是空气污染的主要成分之一。因此，为了健康，有必要提前了解 PM 空气质量。许多空气质量研究都是基于政府的官方空气质量监测站，但由于成本限制，这些监测站无法广泛部署。此外，政府监测站的更新频率为每小时一次，很难捕捉到没有预警的短期 PM 浓度峰值。然而，对于大量的短期数据，数据量巨大，需要以复杂的方式进行计算、分析和预测。这减轻了原始预测器的高计算要求，从而使 Spark 适用于所考虑的问题。本研究提出了一种基于 Spark 大数据框架的 PM 即时预测架构，以处理来自 LASS 社区的大量数据。本研究提出的 Spark 大数据框架分为三个模块。它实时收集 PM 数据，并通过三种机器学习算法（线性回归、随机森林、梯度提升决策树）进行集成学习，以预测接下来 30 到 180 分钟内的 PM 浓度值，并附带可视化图表。实验结果表明，我们提出的 Spark 大数据集成预测模型在接下来的 30 分钟预测中具有最佳性能（R 高达 0.96），集成模型比任何单个机器学习模型都具有更好的性能。台湾长期以来一直处于空气污染质量相对较差的境地。LASS 社区的空气污染物监测数据可以提供更广泛的监测，但数据量大且难以整合或分析。所提出的 Spark 大数据框架系统可以提供短期 PM 预测，并帮助决策者立即采取适当行动。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3808/8296958/2513a0a6e8b0/ijerph-18-07087-g001.jpg

相似文献

Design of a Spark Big Data Framework for PM Air Pollution Forecasting.

Int J Environ Res Public Health. 2021 Jul 2;18(13):7087. doi: 10.3390/ijerph18137087.

A land use regression model using machine learning and locally developed low cost particulate matter sensors in Uganda.

Environ Res. 2021 Aug;199:111352. doi: 10.1016/j.envres.2021.111352. Epub 2021 May 24.

The improvement of spatial-temporal resolution of PM estimation based on micro-air quality sensors by using data fusion technique.

Environ Int. 2020 Jan;134:105305. doi: 10.1016/j.envint.2019.105305. Epub 2019 Nov 15.

An ensemble learning based hybrid model and framework for air pollution forecasting.

Environ Sci Pollut Res Int. 2020 Oct;27(30):38155-38168. doi: 10.1007/s11356-020-09855-1. Epub 2020 Jul 3.

Smart solutions for urban health risk assessment: A PM monitoring system incorporating spatiotemporal long-short term graph convolutional network.

Chemosphere. 2023 Sep;335:139071. doi: 10.1016/j.chemosphere.2023.139071. Epub 2023 Jun 2.

A novel seasonal index-based machine learning approach for air pollution forecasting.

Environ Monit Assess. 2022 May 13;194(6):429. doi: 10.1007/s10661-022-10092-x.

Improving PM prediction in New Delhi using a hybrid extreme learning machine coupled with snake optimization algorithm.

Sci Rep. 2023 Nov 29;13(1):21057. doi: 10.1038/s41598-023-47492-z.

The impact of the congestion charging scheme on air quality in London. Part 1. Emissions modeling and analysis of air pollution measurements.

Res Rep Health Eff Inst. 2011 Apr(155):5-71.

Accurate PM urban air pollution forecasting using multivariate ensemble learning Accounting for evolving target distributions.

Chemosphere. 2024 Sep;364:143097. doi: 10.1016/j.chemosphere.2024.143097. Epub 2024 Aug 16.

Deep neural networks for spatiotemporal PM forecasts based on atmospheric chemical transport model output and monitoring data.

Environ Pollut. 2022 Aug 1;306:119348. doi: 10.1016/j.envpol.2022.119348. Epub 2022 Apr 26.

本文引用的文献

PM2.5 concentration modeling and prediction by using temperature-based deep belief network.

Neural Netw. 2021 Jan;133:157-165. doi: 10.1016/j.neunet.2020.10.013. Epub 2020 Nov 5.

Forecasting Air Quality in Taiwan by Using Machine Learning.

Sci Rep. 2020 Mar 5;10(1):4153. doi: 10.1038/s41598-020-61151-7.

Drivers of improved PM air quality in China from 2013 to 2017.

Proc Natl Acad Sci U S A. 2019 Dec 3;116(49):24463-24469. doi: 10.1073/pnas.1907956116. Epub 2019 Nov 18.

Deep learning-based PM2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China.

Sci Total Environ. 2020 Jan 10;699:133561. doi: 10.1016/j.scitotenv.2019.07.367. Epub 2019 Jul 25.

Ambient PM air pollution exposure and hepatocellular carcinoma incidence in the United States.

Cancer Causes Control. 2018 Jun;29(6):563-572. doi: 10.1007/s10552-018-1036-x. Epub 2018 Apr 25.

Spatiotemporal prediction of daily ambient ozone levels across China using random forest for human exposure assessment.

Environ Pollut. 2018 Feb;233:464-473. doi: 10.1016/j.envpol.2017.10.029. Epub 2017 Nov 5.

Prediction of PM along urban highway corridor under mixed traffic conditions using CALINE4 model.

J Environ Manage. 2017 Aug 1;198(Pt 1):24-32. doi: 10.1016/j.jenvman.2017.04.041. Epub 2017 Apr 24.

Impact analysis of traffic-related air pollution based on real-time traffic and basic meteorological information.

J Environ Manage. 2016 Dec 1;183(Pt 3):510-520. doi: 10.1016/j.jenvman.2016.09.010. Epub 2016 Sep 9.

Association between Atmospheric Fine Particulate Matter and Hospital Admissions for Chronic Obstructive Pulmonary Disease in Southwestern Taiwan: A Population-Based Study.

Int J Environ Res Public Health. 2016 Mar 25;13(4):366. doi: 10.3390/ijerph13040366.

RAQ-A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems.

Sensors (Basel). 2016 Jan 9;16(1):86. doi: 10.3390/s16010086.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr超能文献

设计用于 PM 空气污染预测的 Spark 大数据框架。

Design of a Spark Big Data Framework for PM Air Pollution Forecasting.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译