Basharat Usman, Zhang Wenjing, Han Cuihong, Khan Shoukat Husain, Abbasi Arshad, Mahroof Sehrish, Li Shuxin
Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, China.
Key Laboratory of Groundwater Resources and Environment (Jilin University), Ministry of Education, Changchun 130021, China; College of New Energy and Environment, Jilin University, Changchun 130021, China.
Ecotoxicol Environ Saf. 2025 Sep 1;302:118610. doi: 10.1016/j.ecoenv.2025.118610. Epub 2025 Jul 2.
Groundwater quality monitoring is crucial for protecting the environment and human health. Machine learning (ML) offers substantial potential for enhancing groundwater quality prediction, classification, and identification of pollution indicators. This study evaluates various base ML algorithms and stacking ensemble classifiers (meta-classifiers) using data from 90 groundwater samples collected in District Bagh, Azad Kashmir, Pakistan. The aim was to establish a reliable method for predicting groundwater quality classification. Six supervised machine learning classifiers were utilized, namely Logistic Regression (LR), K-Nearest Neighbours (KNN), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB). These classifiers, along with their corresponding meta-classifiers (Meta-LR, Meta-KNN, Meta-DT, Meta-SVM, Meta-RF, and Meta-XGB), were developed and compared to evaluate their effectiveness in classifying and predicting groundwater quality. Evaluation metrics such as precision, recall, F1-score, accuracy, R, RMSE and ROC curves were used to assess classifiers' performance. Among all the classifiers, SVM and its meta-classifier (Meta-SVM) emerged as the most effective, achieving the highest accuracy score of 0.85-0.89, F1-score (0.88-0.89), R (0.88-1), RMSE (6.72), and Area Under the Curve (AUC) of 0.795. Meta-classifiers achieved better performance than base models for LR (0.85-0.92), SVM (0.88-1.00), and XGB (0.52-0.89). The study also identified key pollution indicators influencing groundwater quality in the area, such as Total Dissolved Solids (TDS), Sulphate (SO), and Nitrate (NO). These indicators showed an increasing trend over time. The research highlights the potential of ML techniques, particularly SVM and meta-SVM, in predicting groundwater quality based on key pollution indicators. The findings underscore the importance of ongoing monitoring and predictive modeling in managing groundwater resources effectively and mitigating pollution impacts. Future applications could refine models and expand datasets to enhance predictive accuracy and applicability across regions and conditions.
地下水质量监测对于保护环境和人类健康至关重要。机器学习(ML)在增强地下水质量预测、分类以及污染指标识别方面具有巨大潜力。本研究使用从巴基斯坦阿扎德克什米尔地区巴格县采集的90个地下水样本数据,评估了各种基础机器学习算法和堆叠集成分类器(元分类器)。目的是建立一种可靠的方法来预测地下水质量分类。使用了六种监督式机器学习分类器,即逻辑回归(LR)、K近邻(KNN)、决策树(DT)、支持向量机(SVM)、随机森林(RF)和极端梯度提升(XGB)。开发并比较了这些分类器及其相应的元分类器(元LR、元KNN、元DT、元SVM、元RF和元XGB),以评估它们在地下水质量分类和预测方面的有效性。使用精度、召回率、F1分数、准确率、R、均方根误差(RMSE)和ROC曲线等评估指标来评估分类器的性能。在所有分类器中,SVM及其元分类器(元SVM)最为有效,准确率最高达到0.85 - 0.89,F1分数为(0.88 - 0.89),R为(0.88 - 1),RMSE为6.72,曲线下面积(AUC)为0.795。元分类器在LR(0.85 - 0.92)、SVM(0.88 - 1.00)和XGB(0.52 - 0.89)方面比基础模型表现更好。该研究还确定了影响该地区地下水质量的关键污染指标,如总溶解固体(TDS)、硫酸盐(SO)和硝酸盐(NO)。这些指标随时间呈上升趋势。该研究突出了机器学习技术,特别是SVM和元SVM,基于关键污染指标预测地下水质量的潜力。研究结果强调了持续监测和预测建模在有效管理地下水资源和减轻污染影响方面的重要性。未来的应用可以改进模型并扩展数据集,以提高跨地区和条件的预测准确性和适用性。