ARC Centre of Excellence for Mathematical & Statistical Frontiers (ACEMS), Australia; Institute for Future Environments, Queensland University of Technology, Brisbane, Queensland, Australia; School of Mathematical Sciences, Science and Engineering Faculty, Queensland University of Technology, Brisbane, Queensland, Australia.
ARC Centre of Excellence for Mathematical & Statistical Frontiers (ACEMS), Australia; Institute for Future Environments, Queensland University of Technology, Brisbane, Queensland, Australia.
Sci Total Environ. 2019 May 10;664:885-898. doi: 10.1016/j.scitotenv.2019.02.085. Epub 2019 Feb 6.
Monitoring the water quality of rivers is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values or trends. However, the data are confounded by anomalies caused by technical issues, for which the volume and velocity of data preclude manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data collected from rivers flowing into the Great Barrier Reef. After identifying end-user needs and defining anomalies, we ranked anomaly importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, incorporation of multiple water-quality variables as covariates reduced performance due to complex relationships among variables. Classifications of drift and periods of anomalously low or high variability were more often correct when we applied mitigation, which replaces anomalous measurements with forecasts for further forecasting, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies and were similarly less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, however, all feature-based methods produced low false positive rates and have the benefit of not requiring training or optimization. Rule-based methods successfully detected a subset of lower priority anomalies, specifically impossible values and missing observations. We therefore suggest that a combination of methods will provide optimal performance in terms of correct anomaly detection, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and anomaly detection developers for optimal outcomes with respect to both detection performance and end-user application. To this end, our framework has high transferability to other types of high frequency time-series data and anomaly detection applications.
河流的水质监测越来越多地采用自动化的原位传感器进行,从而能够更及时地发现异常值或趋势。然而,由于数据量和流速过大,无法手动检测技术问题引起的异常。我们提出了一种用于从流入大堡礁的河流中采集的原位传感器的高频水质数据中进行自动异常检测的框架,该框架使用了浊度、电导率和水位数据。在确定最终用户的需求并定义异常之后,我们对异常的重要性进行了排序,并选择了合适的检测方法。高优先级异常包括突然孤立的尖峰和水平偏移,其中大多数通过自回归综合移动平均模型等基于回归的方法正确分类。但是,由于变量之间的复杂关系,将多个水质变量作为协变量纳入会降低性能。当我们应用缓解措施时,漂移和异常低或高变化时期的分类更为准确,该措施用预测值代替异常测量值以进行进一步预测,但这会增加误报率。基于特征的方法在高优先级异常上的表现也很好,并且在检测低优先级异常方面的表现也同样较差,从而导致高的漏报率。但是,与基于回归的方法不同,所有基于特征的方法都产生了低的误报率,并且具有无需训练或优化的优点。基于规则的方法成功地检测到了一部分低优先级异常,特别是不可能的值和缺失的观测值。因此,我们建议结合使用多种方法,以在正确检测异常的同时最小化误报率,从而获得最佳的检测性能。此外,我们的框架强调了最终用户和异常检测开发人员之间进行沟通的重要性,以便在检测性能和最终用户应用方面都能取得最佳结果。为此,我们的框架具有高度的可移植性,可以应用于其他类型的高频时间序列数据和异常检测应用。