Department of Computer Science and Information Engineering, National Taipei University, New Taipei City, Taiwan.
Galgotias University, Greater Noida, Uttar Pradesh, India.
Environ Sci Pollut Res Int. 2020 Oct;27(30):38155-38168. doi: 10.1007/s11356-020-09855-1. Epub 2020 Jul 3.
As advance of economy and industry, the impact of air pollution has gradually gained attention. In order to predict air quality, there were many studies that exploited various machine learning techniques to build predictive model for pollutant concentration or air quality prediction. However, enhancing the prediction performance always is the common problem of existing studies. Traditional templates based on machine learning and deep learning methods, such as GBTR (gradient boosted tree regression), SVR (support vector machine-based regression), and LSTM (long short-term memory), are most promising approaches to address these problems. Some previous researches showed that ensemble learning technology can improve predictive performance of other domains. In order to improve the accuracy of forecasting, in this paper, we propose a hybrid model and framework to improve the forecasting accuracy of air pollution. We not only exploit stacking-based ensemble learning scheme with Pearson correlation coefficient to calculate the correlation between different machine learning models to integrate various forecasting models together, but also construct a framework based on Spark+Hadoop machine learning and TensorFlow deep learning framework to physically integrate these models to demonstrate the next 1 to 8 h' air pollution forecasting. We also conduct experiments and compare the result with GBTR, SVR, LSTM, and LSTM2 (version 2) models to demonstrate the proposed hybrid model's predictive performance. The experimental results show that the hybrid model is superior to the existing models used for predicting air pollution.
随着经济和工业的发展,空气污染的影响逐渐受到关注。为了预测空气质量,有许多研究利用各种机器学习技术来建立污染物浓度或空气质量预测的预测模型。然而,提高预测性能始终是现有研究的共同问题。传统的基于机器学习和深度学习方法的模板,如 GBTR(梯度提升树回归)、SVR(基于支持向量机的回归)和 LSTM(长短期记忆),是解决这些问题最有前途的方法。一些先前的研究表明,集成学习技术可以提高其他领域的预测性能。为了提高预测的准确性,在本文中,我们提出了一种混合模型和框架来提高空气污染预测的准确性。我们不仅利用基于 Pearson 相关系数的堆叠式集成学习方案来计算不同机器学习模型之间的相关性,将各种预测模型集成在一起,而且还构建了一个基于 Spark+Hadoop 机器学习和 TensorFlow 深度学习框架的框架,将这些模型物理集成在一起,以展示未来 1 到 8 小时的空气污染预测。我们还进行了实验,并将结果与 GBTR、SVR、LSTM 和 LSTM2(版本 2)模型进行比较,以证明所提出的混合模型的预测性能。实验结果表明,该混合模型优于用于预测空气污染的现有模型。