利用深度学习预测空气质量时间序列。

Forecasting air quality time series using deep learning.

机构信息

a School of Engineering , University of Guelph , Guelph , Ontario , Canada.

b Lakes Environmental , Waterloo , Ontario , Canada.

出版信息

J Air Waste Manag Assoc. 2018 Aug;68(8):866-886. doi: 10.1080/10962247.2018.1459956. Epub 2018 May 24.

DOI:10.1080/10962247.2018.1459956

PMID:29652217

Abstract

UNLABELLED

This paper presents one of the first applications of deep learning (DL) techniques to predict air pollution time series. Air quality management relies extensively on time series data captured at air monitoring stations as the basis of identifying population exposure to airborne pollutants and determining compliance with local ambient air standards. In this paper, 8 hr averaged surface ozone (O) concentrations were predicted using deep learning consisting of a recurrent neural network (RNN) with long short-term memory (LSTM). Hourly air quality and meteorological data were used to train and forecast values up to 72 hours with low error rates. The LSTM was able to forecast the duration of continuous O exceedances as well. Prior to training the network, the dataset was reviewed for missing data and outliers. Missing data were imputed using a novel technique that averaged gaps less than eight time steps with incremental steps based on first-order differences of neighboring time periods. Data were then used to train decision trees to evaluate input feature importance over different time prediction horizons. The number of features used to train the LSTM model was reduced from 25 features to 5 features, resulting in improved accuracy as measured by Mean Absolute Error (MAE). Parameter sensitivity analysis identified look-back nodes associated with the RNN proved to be a significant source of error if not aligned with the prediction horizon. Overall, MAE's less than 2 were calculated for predictions out to 72 hours.

IMPLICATIONS

Novel deep learning techniques were used to train an 8-hour averaged ozone forecast model. Missing data and outliers within the captured data set were replaced using a new imputation method that generated calculated values closer to the expected value based on the time and season. Decision trees were used to identify input variables with the greatest importance. The methods presented in this paper allow air managers to forecast long range air pollution concentration while only monitoring key parameters and without transforming the data set in its entirety, thus allowing real time inputs and continuous prediction.

摘要

未加标签

本文介绍了深度学习（DL）技术在预测空气污染时间序列方面的首次应用之一。空气质量管理广泛依赖于空气质量监测站捕获的时间序列数据，以此作为确定人群暴露于空气中污染物程度并确定是否符合当地环境空气质量标准的基础。在本文中，使用由具有长短期记忆（LSTM）的递归神经网络（RNN）组成的深度学习来预测 8 小时平均地面臭氧（O）浓度。使用逐时空气质量和气象数据对网络进行训练和预测，可在低误差率的情况下预测长达 72 小时的未来值。LSTM 还能够预测连续 O 超标持续时间。在训练网络之前，检查了数据集是否存在缺失数据和异常值。使用一种新颖的技术来填补缺失数据，该技术将少于 8 个时间步长的间隙进行平均，同时根据相邻时间段的一阶差分进行增量步长。然后使用数据训练决策树，以评估在不同的时间预测时段内输入特征的重要性。用于训练 LSTM 模型的特征数量从 25 个减少到 5 个，从而提高了准确性，这可以通过平均绝对误差（MAE）进行衡量。参数敏感性分析确定与 RNN 相关的回溯节点，如果与预测时段不匹配，将成为错误的重要来源。总体而言，预测值在 72 小时内的 MAE 小于 2。

含义

使用新颖的深度学习技术训练了 8 小时平均臭氧预测模型。使用一种新的插补方法替换了捕获数据集中的缺失数据和异常值，该方法根据时间和季节生成更接近预期值的计算值。使用决策树确定具有最大重要性的输入变量。本文提出的方法允许空气质量管理者在仅监测关键参数且不改变整个数据集的情况下，预测长时间范围的空气污染浓度，从而实现实时输入和连续预测。