Algae & Organic Matter Laboratory, School of Chemical Engineering, University of New South Wales, Sydney 2052, Australia; UNESCO Centre for Membrane Science & Technology, School of Chemical Engineering, University of New South Wales, Sydney 2052, Australia.
School of Chemical Engineering, University of New South Wales, Sydney 2052, Australia.
Water Res. 2023 May 15;235:119874. doi: 10.1016/j.watres.2023.119874. Epub 2023 Mar 12.
Four different machine learning algorithms, including Decision Tree (DT), Random Forest (RF), Multivariable Linear Regression (MLR), Support Vector Regressions (SVR), and Gaussian Process Regressions (GPR), were applied to predict the performance of a multi-media filter operating as a function of raw water quality and plant operating variables. The models were trained using data collected over a seven year period covering water quality and operating variables, including true colour, turbidity, plant flow, and chemical dose for chlorine, KMnO, FeCl, and Cationic Polymer (PolyDADMAC). The machine learning algorithms have shown that the best prediction is at a 1-day time lag between input variables and unit filter run volume (UFRV). Furthermore, the RF algorithm with grid search using the input metrics mentioned above with a 1-day time lag has provided the highest reliability in predicting UFRV with a RMSE and R of 31.58 and 0.98, respectively. Similarly, RF with grid search has shown the shortest training time, prediction accuracy, and forecasting events using a ROC-AUC curve analysis (AUC over 0.8) in extreme wet weather events. Therefore, Random Forest with grid search and a 1-day time lag is an effective and robust machine learning algorithm that can predict the filter performance to aid water treatment operators in their decision makings by providing real-time warning of the potential turbidity breakthrough from the filters.
四种不同的机器学习算法,包括决策树 (DT)、随机森林 (RF)、多元线性回归 (MLR)、支持向量回归 (SVR) 和高斯过程回归 (GPR),被应用于预测多媒体过滤器的性能,该过滤器作为原水质量和工厂操作变量的函数运行。该模型使用在 7 年期间收集的数据进行训练,包括真实颜色、浊度、工厂流量以及氯、KMnO、FeCl 和阳离子聚合物 (PolyDADMAC) 的化学剂量等水质和操作变量。机器学习算法表明,最佳预测是在输入变量和单位过滤运行体积 (UFRV) 之间存在 1 天时间滞后的情况下。此外,使用上述输入指标进行网格搜索的 RF 算法,以及 1 天时间滞后,在预测 UFRV 方面提供了最高的可靠性,其 RMSE 和 R 分别为 31.58 和 0.98。同样,使用网格搜索的 RF 显示出最短的训练时间、预测精度以及在极端潮湿天气事件中使用 ROC-AUC 曲线分析(AUC 超过 0.8)的预测事件。因此,具有网格搜索和 1 天时间滞后的随机森林是一种有效的和强大的机器学习算法,它可以预测过滤器的性能,通过实时警告过滤器中潜在的浊度突破,为水处理操作人员的决策提供帮助。