School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
School of Hydraulic Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
Sci Total Environ. 2022 Jul 1;828:154284. doi: 10.1016/j.scitotenv.2022.154284. Epub 2022 Mar 2.
This study presents a new stacking ensemble model for contamination event detection using multiple water quality parameters. The stacking model consists of a number of machine learning base predictors and a meta-predictor, and it is trained using cross-validation to capture different features in multiple water quality parameters and then used for water quality predictions. For each water quality parameter, the residuals between predicted and measured data are classified to identify anomalies with thresholds derived from the sequential model-based optimization method and detection probabilities updated using Bayesian analysis. Alarms derived from individual water quality parameters are fused to enhance the anomaly signals and improve the detection accuracy. The proposed stacking-based method is evaluated using a data set of six water quality parameters from a real water distribution system with randomly simulated events. The stacking-based method could detect 2496 events out of a total 2500 events without a false alarm. The results show that the stacking method outperforms an artificial neural network (ANN) benchmark method in contamination event detection. The stacking method has a higher true positive rate, lower false positive rate and higher F1 score than the ANN method. This implies that the stacking method has great promise of detecting contamination events in the water distribution system.
本研究提出了一种新的基于堆叠的集成模型,用于使用多个水质参数检测污染事件。堆叠模型由多个机器学习基础预测器和一个元预测器组成,它使用交叉验证进行训练,以捕获多个水质参数中的不同特征,然后用于水质预测。对于每个水质参数,通过基于序贯模型优化方法的阈值对预测数据和实测数据之间的残差进行分类,并使用贝叶斯分析更新检测概率,以识别异常值。通过融合来自各个水质参数的警报,可以增强异常信号,提高检测精度。利用来自实际水分配系统的六个水质参数的数据集,对基于堆叠的方法进行了评估,该数据集包含随机模拟的事件。基于堆叠的方法可以在没有误报的情况下检测到 2500 个事件中的 2496 个事件。结果表明,在污染事件检测方面,堆叠方法优于人工神经网络(ANN)基准方法。与 ANN 方法相比,堆叠方法的真阳性率更高,假阳性率更低,F1 得分更高。这意味着堆叠方法在检测水分配系统中的污染事件方面具有很大的潜力。