Bo-Qi Liu, Ding-Jie Zhou, Yang Zhao, Long-Yu Shi
State Key Laboratory of Regional and Urban Ecology, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen, Fujian, China.
Rural Revitalization College, Fujian Agriculture and Forestry University, Fuzhou, China.
PLoS One. 2025 Jun 10;20(6):e0325234. doi: 10.1371/journal.pone.0325234. eCollection 2025.
Effluent quality prediction is critical for optimizing Wastewater Treatment Plant (WWTP) operations, ensuring regulatory compliance, and promoting environmental sustainability. This study evaluates the performance of five supervised learning models-AdaBoost, Backpropagation Neural Networks (BP-NN), Support Vector Machine (SVR), XGBoost, and Gradient Boosting (GB)-using data from a WWTP in Zhuhai, China. The Effluent Quality Index (EQI), integrating multiple pollutant concentrations and environmental impacts, was used as the target variable. The models were trained and tested on 84 monthly datasets, with their performances compared using R2, Mean Absolute Percentage Error (MAPE), and Mean Bias Error (MBE). XGBoost achieved the best balance between accuracy and robustness, with the lowest MAPE(6.11%) and a high R2(0.813), while SVR excelled in fitting accuracy(R2 = 0.826) but showed limitations in error control. Although we employed GridSearchCV with cross-validation to optimize hyperparameters and ensure a fair model comparison, the study is limited by the reliance on data from a single WWTP and the relatively small dataset size (84 records). Nevertheless, the findings provide valuable insights into selecting effective machine learning models for effluent quality prediction, supporting data-driven decision-making in wastewater management and advancing intelligent process optimization in WWTP.
出水水质预测对于优化污水处理厂(WWTP)运营、确保符合监管要求以及促进环境可持续性至关重要。本研究使用来自中国珠海某污水处理厂的数据,评估了五种监督学习模型——AdaBoost、反向传播神经网络(BP-NN)、支持向量机(SVR)、XGBoost和梯度提升(GB)的性能。综合多种污染物浓度和环境影响的出水水质指数(EQI)被用作目标变量。这些模型在84个月度数据集上进行了训练和测试,并使用R2、平均绝对百分比误差(MAPE)和平均偏差误差(MBE)对它们的性能进行了比较。XGBoost在准确性和稳健性之间实现了最佳平衡,MAPE最低(6.11%)且R2较高(0.813),而SVR在拟合精度方面表现出色(R2 = 0.826),但在误差控制方面存在局限性。尽管我们采用了带交叉验证的网格搜索(GridSearchCV)来优化超参数并确保模型比较的公平性,但该研究受到依赖单个污水处理厂数据和相对较小数据集规模(84条记录)的限制。尽管如此,这些发现为选择有效的机器学习模型进行出水水质预测提供了有价值的见解,支持了废水管理中数据驱动的决策制定,并推动了污水处理厂的智能过程优化。