College of IoT Engineering, Hohai University, Changzhou 213022, China; School of Information and Engineering, Changzhou University, Changzhou 213164, China; Jiangsu Key laboratory of Special Robot Technology, Changzhou 213022, China.
College of IoT Engineering, Hohai University, Changzhou 213022, China; School of Information and Engineering, Changzhou University, Changzhou 213164, China.
Environ Int. 2020 Jun;139:105713. doi: 10.1016/j.envint.2020.105713. Epub 2020 Apr 11.
Incomplete observation of hourly air-pollutants concentration data is a common issue existing in urban air quality monitoring networks. This research proposes a spatial interpolation method to impute missing values presented in air pollutants data sets based on low rank matrix completion (LRMC). It considers air pollutants data of high correlation and consistency in its spatial distribution. We evaluate the performance of the proposed method when imputing various air pollutants concentration time series (NO,O,SO,PM,PM) in terms of root mean square error (RMSE), index of agreement (D), and goodness of fit (R). It systematically compared with existing established imputation techniques, including nearest neighboring, mean substitution, regression-based method, spline interpolation, spectral method, and regularized expectation maximization algorithm (EM). As a spatial imputation method, LRMC outperforms these methods used in this research under the condition of larger missing ratio (such as 30% removal) on the central air pollutants monitoring station. For all monitoring stations, comprehensive experimental results show that LRMC always generates robust results to replace missing data with reasonable substitutions, and it is not sensitive to the length of missing gaps. The promising imputation performance in terms of the indicator R obtained by the proposed LRMC demonstrates that it can effectively impute missing values of air pollutants time series based on their inherent patterns.
在城市空气质量监测网络中,不完全观测每小时空气污染物浓度数据是一个常见问题。本研究提出了一种基于低秩矩阵补全(LRMC)的空间插值方法,用于对空气污染物数据集缺失值进行插补。该方法考虑了空间分布上具有高相关性和一致性的空气污染物数据。我们从均方根误差(RMSE)、一致性指数(D)和拟合优度(R)等方面评估了该方法在插补各种空气污染物浓度时间序列(NO、O、SO、PM、PM)时的性能。它与现有的一些插补技术(包括最近邻法、均值替代法、基于回归的方法、样条插值法、谱方法和正则化期望最大化算法(EM))进行了系统比较。作为一种空间插补方法,在中心空气污染物监测站较大缺失率(如 30%缺失)的情况下,LRMC 的性能优于本研究中使用的这些方法。对于所有监测站,综合实验结果表明,LRMC 总是能够生成稳健的结果,用合理的替代值替换缺失数据,并且对缺失间隙的长度不敏感。LRMC 在指标 R 上的有希望的插补性能表明,它可以有效地根据空气污染物时间序列的固有模式对缺失值进行插补。