Khalil Mostafa, AlSayed Ahmed, Liu Yang, Vanrolleghem Peter A
Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada.
Department of Civil and Environmental Engineering, McCormick School of Engineering, Northwestern University, United States.
Water Res. 2023 Oct 15;245:120667. doi: 10.1016/j.watres.2023.120667. Epub 2023 Sep 24.
Nitrous oxide (NO) emissions may account for up to 80 % of a wastewater treatment plant's (WWTP) total carbon footprint. Given the complexity of the pathways involved, estimating NO emissions through mechanistic models still often fails to precisely depict process dynamics. Alternatively, data-driven methods for predicting NO emissions hold substantial potential. However, so far, a comprehensive approach is still overlooked, impeding the advancement of full-scale application. Therefore, this study develops a comprehensive approach for using machine learning to perform online process modeling of NO emissions. The approach is tested on a long-term NO emission dataset from a full-scale WWTP. Uniquely, the proposed approach emphasizes not just model accuracy, but it also considers model complexity, computational speed, and interpretability, equipping operators with the insights needed for informed corrective actions. Algorithms with varying levels of complexity and interpretability including k-Nearest Neighbors (kNN), decision trees, ensemble learning models, and deep neural networks (DNN) were considered. Furthermore, a parametric multivariate outlier removal method was adjusted to account for data statistical distributions, significantly reducing data loss. By employing an effective feature selection methodology, a trade-off between data acquisition, model performance, and complexity was found, reducing the number of features by 40 % and decreasing data collection cost, model complexity and computational burden without significant effect on modeling accuracy. The best performing models are kNN (R = 0.88), AdaBoost (R = 0.94), and DNN (R = 0.90). Feature importance of models was analyzed and compared with process knowledge to test interpretability, guiding NO mitigation decisions.
一氧化二氮(NO)排放可能占污水处理厂(WWTP)总碳足迹的80%。鉴于所涉及途径的复杂性,通过机理模型估算NO排放往往仍无法精确描述过程动态。相比之下,数据驱动的NO排放预测方法具有很大潜力。然而,到目前为止,一种全面的方法仍被忽视,阻碍了全面应用的推进。因此,本研究开发了一种使用机器学习对NO排放进行在线过程建模的综合方法。该方法在一个全尺寸污水处理厂的长期NO排放数据集上进行了测试。独特的是,所提出的方法不仅强调模型准确性,还考虑模型复杂性、计算速度和可解释性,为操作人员提供明智的纠正措施所需的见解。考虑了具有不同复杂程度和可解释性的算法,包括k近邻(kNN)、决策树、集成学习模型和深度神经网络(DNN)。此外,调整了一种参数化多变量异常值去除方法以适应数据统计分布,显著减少了数据损失。通过采用有效的特征选择方法,在数据采集、模型性能和复杂性之间找到了一种权衡,将特征数量减少了40%,并降低了数据收集成本、模型复杂性和计算负担,而对建模准确性没有显著影响。性能最佳的模型是kNN(R = 0.88)、AdaBoost(R = 0.94)和DNN(R = 0.90)。分析了模型的特征重要性,并与过程知识进行比较以测试可解释性,指导NO减排决策。