Department of Civil Engineering, Kyung Hee University, Yongin-si, Gyeonggi-do, Republic of Korea.
Department of Civil Engineering, Kyung Hee University, Yongin-si, Gyeonggi-do, Republic of Korea.
J Environ Manage. 2021 Dec 15;300:113795. doi: 10.1016/j.jenvman.2021.113795. Epub 2021 Sep 21.
This study carried out machine-learning (ML) modeling using activated sludge microbiome data to predict the operational characteristics of biological unit processes (i.e., anaerobic, anoxic, and aerobic) in a full-scale municipal wastewater treatment plant. An ML application pipeline with optimization strategies (e.g., model selection, input data preprocessing, and hyperparameter tuning) could significantly improve prediction performance. Comparative analysis of the ML prediction performance suggested that linear models (support vector machine and logistic regression) had a high prediction performance (93% accuracy), comparable to that of non-linear models such as random forest. Feature importance analysis using the linear ML models identified the microbial taxa that were specifically associated with anoxic processes, many of which (e.g., Ferruginibacter) were found to have ecologically important genomic and phenotypic characteristics (e.g., for nitrate reduction). Time-series microbial community dynamics demonstrated that the taxa identified using ML were frequently occurring and dominating in the anoxic process over time, thus representing the core nitrate-reducing community. Despite the general dominance of the core community over time, the analysis further revealed successional seasonal patterns of distinct sub-groups, indicating differences in the functional contribution of sub-groups by season to the overall nitrate-reducing potential of the system. Overall, the results of this study suggest that ML modeling holds great promise for the predictive identification and understanding of key microbial players governing the functioning and stability of biological wastewater systems.
本研究利用活性污泥微生物组数据进行机器学习 (ML) 建模,以预测全规模城市污水处理厂生物单元工艺(即厌氧、缺氧和有氧)的运行特性。具有优化策略(例如,模型选择、输入数据预处理和超参数调整)的 ML 应用程序管道可以显著提高预测性能。ML 预测性能的比较分析表明,线性模型(支持向量机和逻辑回归)具有较高的预测性能(准确率为 93%),与非线性模型(如随机森林)相当。使用线性 ML 模型进行的特征重要性分析确定了与缺氧过程特别相关的微生物分类群,其中许多(例如 Ferruginibacter)具有生态上重要的基因组和表型特征(例如,用于硝酸盐还原)。时间序列微生物群落动态表明,使用 ML 鉴定的分类群在缺氧过程中随着时间的推移经常发生并占主导地位,因此代表了核心硝酸盐还原群落。尽管核心群落随着时间的推移普遍占主导地位,但分析还进一步揭示了不同季节的明显亚群的演替季节模式,表明亚群在不同季节对系统整体硝酸盐还原潜力的功能贡献存在差异。总体而言,本研究的结果表明,ML 建模对于预测识别和理解控制生物废水系统功能和稳定性的关键微生物参与者具有很大的前景。