School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
Environ Sci Pollut Res Int. 2020 Aug;27(23):28931-28948. doi: 10.1007/s11356-020-08948-1. Epub 2020 May 17.
Data-driven statistical air quality prediction methods usually build models fast with moderate accuracy and have been studied a lot in recent years. However, due to the complexity of air quality prediction which usually involves multiple factors, such as meteorological, spatial, and temporal properties, it is still a challenge to propose a model with required accuracy. In this paper, we propose a hybrid ensemble model CERL to exploit the merits of both forward neural networks and recurrent neural networks that are designed for handling time serial data to predict air quality hourly. Measured air pollutant factors including Air Quality Index (AQI), PM, PM, CO, SO, NO, and O are used as input to predict air quality from 1 to 8 h ahead. Based on the air quality prediction evaluation in Lanzhou and Xi'an, which are two important provincial capitals in Northwest China, CERL provides better performance over other baseline models. Moreover, as the step length increases, CERL has more obvious improvement. For example, the improvements of CERL in the 1-step, 3-step, 5-step, and 8-step prediction for PM in Lanzhou are 1.82%, 8.01%, 9.98%, and 20.03%, respectively. The superiority of CERL is also proved by a hypothesis Diebold Mariano test with level of significance 5%.
数据驱动的统计空气质量预测方法通常具有中等精度的快速模型构建能力,近年来受到了广泛研究。然而,由于空气质量预测的复杂性,通常涉及多个因素,如气象、空间和时间特性,因此提出具有所需精度的模型仍然是一个挑战。在本文中,我们提出了一种混合集成模型 CERL,利用前馈神经网络和循环神经网络的优点,这些网络专门用于处理时间序列数据,以预测每小时的空气质量。测量的空气污染物因素,包括空气质量指数 (AQI)、PM、PM、CO、SO、NO 和 O,被用作输入,以预测未来 1 到 8 小时的空气质量。基于对中国西北地区两个重要省会城市兰州和西安的空气质量预测评估,CERL 提供了优于其他基线模型的性能。此外,随着步长的增加,CERL 的性能提升更为明显。例如,CERL 在兰州的 PM 的 1 步、3 步、5 步和 8 步预测中的改进分别为 1.82%、8.01%、9.98%和 20.03%。CERL 的优越性还通过具有 5%显著水平的 Diebold Mariano 检验得到了证明。