Suppr超能文献

设计用于 PM 空气污染预测的 Spark 大数据框架。

Design of a Spark Big Data Framework for PM Air Pollution Forecasting.

机构信息

Department of Information Management, National Yunlin University of Science & Technology, Douliu 64002, Taiwan.

Faculty of Environment, University of Science, 227 Nguyen Van Cu Street, District 5, Ho Chi Minh City 700000, Vietnam.

出版信息

Int J Environ Res Public Health. 2021 Jul 2;18(13):7087. doi: 10.3390/ijerph18137087.

Abstract

In recent years, with rapid economic development, air pollution has become extremely serious, causing many negative effects on health, environment and medical costs. PM is one of the main components of air pollution. Therefore, it is necessary to know the PM air quality in advance for health. Many studies on air quality are based on the government's official air quality monitoring stations, which cannot be widely deployed due to high cost constraints. Furthermore, the update frequency of government monitoring stations is once an hour, and it is hard to capture short-term PM concentration peaks with little warning. Nevertheless, dealing with short-term data with many stations, the volume of data is huge and is calculated, analyzed and predicted in a complex way. This alleviates the high computational requirements of the original predictor, thus making Spark suitable for the considered problem. This study proposes a PM instant prediction architecture based on the Spark big data framework to handle the huge data from the LASS community. The Spark big data framework proposed in this study is divided into three modules. It collects real time PM data and performs ensemble learning through three machine learning algorithms (Linear Regression, Random Forest, Gradient Boosting Decision Tree) to predict the PM concentration value in the next 30 to 180 min with accompanying visualization graph. The experimental results show that our proposed Spark big data ensemble prediction model in next 30-min prediction has the best performance (R up to 0.96), and the ensemble model has better performance than any single machine learning model. Taiwan has been suffering from a situation of relatively poor air pollution quality for a long time. Air pollutant monitoring data from LASS community can provide a wide broader monitoring, however the data is large and difficult to integrate or analyze. The proposed Spark big data framework system can provide short-term PM forecasts and help the decision-maker to take proper action immediately.

摘要

近年来,随着经济的快速发展,空气污染变得极其严重,对健康、环境和医疗成本造成了许多负面影响。PM 是空气污染的主要成分之一。因此,为了健康,有必要提前了解 PM 空气质量。许多空气质量研究都是基于政府的官方空气质量监测站,但由于成本限制,这些监测站无法广泛部署。此外,政府监测站的更新频率为每小时一次,很难捕捉到没有预警的短期 PM 浓度峰值。然而,对于大量的短期数据,数据量巨大,需要以复杂的方式进行计算、分析和预测。这减轻了原始预测器的高计算要求,从而使 Spark 适用于所考虑的问题。本研究提出了一种基于 Spark 大数据框架的 PM 即时预测架构,以处理来自 LASS 社区的大量数据。本研究提出的 Spark 大数据框架分为三个模块。它实时收集 PM 数据,并通过三种机器学习算法(线性回归、随机森林、梯度提升决策树)进行集成学习,以预测接下来 30 到 180 分钟内的 PM 浓度值,并附带可视化图表。实验结果表明,我们提出的 Spark 大数据集成预测模型在接下来的 30 分钟预测中具有最佳性能(R 高达 0.96),集成模型比任何单个机器学习模型都具有更好的性能。台湾长期以来一直处于空气污染质量相对较差的境地。LASS 社区的空气污染物监测数据可以提供更广泛的监测,但数据量大且难以整合或分析。所提出的 Spark 大数据框架系统可以提供短期 PM 预测,并帮助决策者立即采取适当行动。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3808/8296958/2513a0a6e8b0/ijerph-18-07087-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验