Global Centre for Clean Air Research (GCARE), Department of Civil and Environmental Engineering, Faculty of Engineering and Physical Sciences, University of Surrey, Guildford GU2 7XH, UK.
Environ Sci Process Impacts. 2019 Apr 17;21(4):701-713. doi: 10.1039/c8em00593a.
Air pollution is a major environmental health problem around the world, which needs to be monitored. In recent years, a new generation of low-cost air pollution sensors has emerged. Poor or unknown data quality, resulting from the intrinsic properties of the sensor as well as the lack of a consensus on data processing methodologies for these sensors, has, among other factors, prevented widespread adoption of these sensors. To contribute to the creation of this consensus, we reviewed the available methodologies for quality control, outlier detection and gap filling and applied two outlier detection methodologies and five gap filling methodologies to a case study (consisting of an 11-month long air quality data set from a low-cost sensor). We showed that erroneous data can be detected in a fully automated way, and that point and contextual outlier detection methodologies can be applied to low-cost air pollution data and yield meaningful results. The linear interpolation showed the best performance for gap filling for low-cost air pollution sensors. In conclusion, data cleaning procedures are important, and the presented methods can form part of a generalised data processing methodology for low-cost air pollution sensors.
空气污染是全球主要的环境健康问题,需要进行监测。近年来,出现了新一代低成本空气污染传感器。由于传感器的固有特性以及缺乏针对这些传感器的数据处理方法的共识等因素,导致数据质量较差或未知,这在其他因素之外,也阻止了这些传感器的广泛采用。为了有助于达成这一共识,我们回顾了现有的质量控制、异常值检测和缺失值填补方法,并将两种异常值检测方法和五种缺失值填补方法应用于一个案例研究(包括来自低成本传感器的长达 11 个月的空气质量数据集)。我们表明,可以以完全自动化的方式检测到错误数据,并且可以将点和上下文异常值检测方法应用于低成本空气污染数据,并产生有意义的结果。线性插值在填补低成本空气污染传感器的缺失值方面表现最佳。总之,数据清理程序很重要,并且所提出的方法可以成为低成本空气污染传感器通用数据处理方法的一部分。