Aljehani Fahad, N'Doye Ibrahima, Hong Pei-Ying, Monjed Mohammad Khalil, Laleg-Kirati Taous-Meriem
Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia.
Environmental Science and Engineering Program, Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900, Thuwal, Saudi Arabia.
Sci Rep. 2024 Dec 28;14(1):31218. doi: 10.1038/s41598-024-82598-y.
Reduced bacteria concentrations in wastewater is a key indicator of the efficacy of water resource recovery facilities (WRRFs). However, monitoring the presence of bacterial concentrations in real time at each stage of the WRRF is challenging as it requires taking and processing water samples offline. Although few studies have been proposed to predict bacterial concentrations using data-driven models, generalizing these models to unseen data from different WRRFs remains challenging. This paper proposes a calibration approach based on neural networks to adapt the optimal models across various WRRFs in Saudi Arabia for bacterial estimation at the influent and effluent stages. The calibration relies on the out-of-distribution (OOD) framework of the physiochemical water parameters (e.g., pH, COD, TDS, turbidity, conductivity) with a design threshold chosen based on the data distribution of the received unseen samples. We propose a calibration framework that continues updating the trained neural network model for accurate bacterial concentration estimation upon receiving new samples. We tested the effectiveness of the proposed calibration scheme on four WRRF datasets in Saudi Arabia, comparing the results with before and after calibration without the OOD. Before calibration model was based on a traditional and optimal neural network approach, typically considered the conventional method for building neural networks. After calibration without OOD, the model continued retraining without explicitly checking for OOD condition. The results showed that the proposed calibration framework of the selected baseline WRRF with the OOD scheme improved [Formula: see text] and [Formula: see text] of the worst-case influent bacteria concentration before calibration and after calibration without OOD, respectively. Similarly, the worst-case effluent bacteria concentration estimation was enhanced by [Formula: see text] before calibration and [Formula: see text] after calibration without the OOD. Our findings highlight the importance of integrating the calibration framework with neural network approaches to achieve model generalization.
废水中细菌浓度的降低是水资源回收设施(WRRFs)效能的关键指标。然而,在WRRF的每个阶段实时监测细菌浓度具有挑战性,因为这需要离线采集和处理水样。尽管已经有一些研究提出使用数据驱动模型来预测细菌浓度,但将这些模型推广到来自不同WRRF的未见数据仍然具有挑战性。本文提出了一种基于神经网络的校准方法,以适配沙特阿拉伯各种WRRF的最优模型,用于进水和出水阶段的细菌估计。该校准依赖于物理化学水参数(如pH值、化学需氧量、总溶解固体、浊度、电导率)的分布外(OOD)框架,并根据所接收未见样本的数据分布选择设计阈值。我们提出了一个校准框架,在接收到新样本时持续更新训练好的神经网络模型,以进行准确的细菌浓度估计。我们在沙特阿拉伯的四个WRRF数据集上测试了所提出校准方案的有效性,并将结果与未采用OOD的校准前后进行比较。校准前的模型基于传统的最优神经网络方法,这是构建神经网络通常考虑的常规方法。未采用OOD进行校准时,模型继续重新训练,而没有明确检查OOD条件。结果表明,所提出的带有OOD方案的选定基线WRRF校准框架分别将校准前和未采用OOD校准时最坏情况下进水细菌浓度的[公式:见原文]和[公式:见原文]进行了改善。同样,最坏情况下出水细菌浓度估计在校准前提高了[公式:见原文],未采用OOD校准时提高了[公式:见原文]。我们的研究结果突出了将校准框架与神经网络方法相结合以实现模型泛化的重要性。