Department of Civil Engineering, College of Engineering, Kyung Hee University, Yongin, Republic of Korea.
Department of Civil and Environmental Engineering, Yonsei University, Seoul, Republic of Korea.
J Hazard Mater. 2024 Jul 5;472:134513. doi: 10.1016/j.jhazmat.2024.134513. Epub 2024 May 3.
Groundwater (GW) quality monitoring is vital for sustainable water resource management. The present study introduced a metagenome-derived machine learning (ML) model aimed at enhancing the predictive understanding and diagnostic interpretation of GW pollution associated with petroleum. In this framework, taxonomic and metabolic profiles derived from GW metagenomes were combined for use as the input dataset. By employing strategies that optimized data integration, model selection, and parameter tuning, we achieved a significant increase in diagnostic accuracy for petroleum-polluted GW. Explanatory artificial intelligence techniques identified petroleum degradation pathways and Rhodocyclaceae as strong predictors of a pollution diagnosis. Metagenomic analysis corroborated the presence of gene operons encoding aminobenzoate and xylene biodegradation within the de novo assembled genome of Rhodocyclaceae. Our genome-centric metagenomic analysis thus clarified the ecological interactions associated with microbiomes in breaking down petroleum contaminants, validating the ML-based diagnostic results. This metagenome-derived ML framework not only enhances the predictive diagnosis of petroleum pollution but also offers interpretable insights into the interaction between microbiomes and petroleum. The proposed ML framework demonstrates great promise for use as a science-based strategy for the on-site monitoring and remediation of GW pollution.
地下水 (GW) 质量监测对于可持续水资源管理至关重要。本研究引入了一种基于宏基因组的机器学习 (ML) 模型,旨在增强对与石油相关的 GW 污染的预测理解和诊断解释。在这个框架中,从 GW 宏基因组中提取的分类和代谢特征被组合在一起作为输入数据集。通过采用优化数据集成、模型选择和参数调整的策略,我们显著提高了对石油污染 GW 的诊断准确性。解释性人工智能技术确定了石油降解途径和红环菌科是污染诊断的有力预测因子。宏基因组分析证实了 Rhodocyclaceae 从头组装基因组中存在编码氨基苯甲酸和二甲苯生物降解的基因操纵子。因此,我们的基于基因组的宏基因组分析阐明了与分解石油污染物的微生物组相关的生态相互作用,验证了基于 ML 的诊断结果。该基于宏基因组的 ML 框架不仅增强了对石油污染的预测诊断,还提供了对微生物组与石油相互作用的可解释见解。所提出的 ML 框架有望成为一种基于科学的策略,用于 GW 污染的现场监测和修复。