Díaz-González L, Aguilar-Rodríguez R A, Pérez-Sansalvador J C, Lakouari N
Centro de Investigación en Ciencias, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62209, Mexico..
Maestría en Optimización y Cómputo Aplicado, Universidad Autónoma del Estado de Morelos, Cuernavaca, Morelos 62209, Mexico.
J Contam Hydrol. 2025 Feb;269:104498. doi: 10.1016/j.jconhyd.2025.104498. Epub 2025 Jan 3.
This study addresses the critical challenge of assessing the quality of groundwater and surface water, which are essential resources for various societal needs. The main contribution of this study is the application of machine learning models for evaluating water quality, using a national database from Mexico that includes groundwater, lotic (flowing), lentic (stagnant), and coastal water quality parameters. Notably, no comparable water quality classification system currently exists. Five advanced machine learning techniques were employed: extreme gradient boosting (XGB), support vector machines, K-nearest neighbors, decision trees, and multinomial logistic regression. The performance of the models was evaluated using the accuracy, precision, and F1 score metrics. The decision tree models emerged as the most effective across all water body types, closely followed by XGB. Therefore, the decision tree models were integrated into the AQuA-P software, which is currently the only software of its kind. It is recommended that these innovative water classification models be used through the AQuA-P software to facilitate informed decision-making in water quality management. This software provides a probability-based classification system that contributes to a deeper understanding of water quality dynamics. Lastly, an open-access repository containing all the datasets and Python notebooks used in our analysis is provided, allowing for easy adaptation and implementation of our methodology for other datasets worldwide.
本研究应对了评估地下水和地表水质量这一关键挑战,而地下水和地表水是满足各种社会需求的重要资源。本研究的主要贡献在于应用机器学习模型来评估水质,使用了来自墨西哥的一个国家数据库,该数据库包含地下水、流水(流动水体)、静水(静止水体)和沿海水质参数。值得注意的是,目前不存在可与之媲美的水质分类系统。采用了五种先进的机器学习技术:极端梯度提升(XGB)、支持向量机、K近邻、决策树和多项逻辑回归。使用准确率、精确率和F1分数指标对模型的性能进行评估。决策树模型在所有水体类型中表现最为有效,紧随其后的是XGB。因此,决策树模型被集成到AQuA-P软件中,该软件是目前同类软件中的唯一一款。建议通过AQuA-P软件使用这些创新的水质分类模型,以促进水质管理中的明智决策。该软件提供了一个基于概率的分类系统,有助于更深入地理解水质动态。最后,提供了一个开放获取的资源库,其中包含我们分析中使用的所有数据集和Python笔记本,便于全球其他数据集轻松采用和实施我们的方法。