Islam Abu Reza Md Towfiqul, Mamun Md Abdullah-Al, Hasan Mehedi, Aktar Mst Nazneen, Uddin Md Nashir, Siddique Md Abu Bakar, Chowdhury Mohaiminul Haider, Islam Md Saiful, Bari A B M Mainul, Idris Abubakr M, Senapathi Venkatramanan
Department of Disaster Management, Begum Rokeya University, Rangpur 5400, Bangladesh; Department of Development Studies, Daffodil International University, Dhaka 1216, Bangladesh; Department of Earth and Environmental Science, College of Science, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, Republic of Korea.
Department of Data Science, Tampere University, Finland.
J Contam Hydrol. 2025 Feb;269:104480. doi: 10.1016/j.jconhyd.2024.104480. Epub 2024 Dec 10.
Investigating the potential of novel data mining algorithms (DMAs) for modeling groundwater quality in coastal areas is an important requirement for groundwater resource management, especially in the coastal region of Bangladesh where groundwater is highly contaminated. In this work, the applicability of DMA, including Gaussian Process Regression (GPR), Bayesian Ridge Regression (BRR) and Artificial Neural Network (ANN), for predicting groundwater quality in coastal areas was investigated. The optuna-based optimized hyperparameter is proposed to improve the accuracy of the models, including optuna-GPR and optuna-BRR as benchmark models. Combined cross-validation (CV) and bootstrapping (B) methods were used to build six predictive models. The entropy-based coastal groundwater quality index (ECWQI) was converted into a normalized index (ECWQIn), which was divided into five classes from very poor to excellent. The self-organizing map (SOM), spatial autocorrelation and fuzzy logic model were used to identify spatial groundwater quality patterns based on 12 physicochemical variables collected from 67 groundwater wells. The SOM analysis identified four distinct spatial patterns, including EC-TDS-Cl, MgpH, CaKNO₃, and HCO₃SO₄NaF. The results showed that both the ANN (CV) and ANN (B) models performed better than other optuna-based models during the test phase (RMSE = 0.041, MAE = 0.026, R2 = 0.971, RAE = 0.15 = 21 and CC = 0.986) and (RMSE = 0.041, MAE = 0.025, R2 = 0.969, RAE = 0.119 and CC = 0.975), respectively. SO, Cl and F played an important role in the prediction accuracy. F- and SO showed higher spatial autocorrelation, which affected groundwater quality degradation. In addition, the ANN (CV) and ANN (B) models showed a Gaussian distribution of model errors (small standard error, <1 %), indicating the stability of the model. These results indicate the efficiency of the ANN model in predicting groundwater quality in coastal areas, which would help regional water managers in real-time monitoring and management of sustainable groundwater resources.
研究新型数据挖掘算法(DMA)在沿海地区地下水水质建模方面的潜力,是地下水资源管理的一项重要要求,特别是在孟加拉国沿海地区,那里的地下水受到高度污染。在这项工作中,研究了包括高斯过程回归(GPR)、贝叶斯岭回归(BRR)和人工神经网络(ANN)在内的DMA在预测沿海地区地下水水质方面的适用性。提出了基于Optuna的优化超参数,以提高模型的准确性,包括将Optuna-GPR和Optuna-BRR作为基准模型。采用组合交叉验证(CV)和自举法(B)构建了六个预测模型。基于熵的沿海地下水水质指数(ECWQI)被转换为归一化指数(ECWQIn),该指数分为从极差到极佳的五个等级。利用自组织映射(SOM)、空间自相关和模糊逻辑模型,根据从67口地下水井采集的12个理化变量,识别地下水水质的空间模式。SOM分析确定了四种不同的空间模式,包括EC-TDS-Cl、MgpH、CaKNO₃和HCO₃SO₄NaF。结果表明,在测试阶段,人工神经网络(CV)模型和人工神经网络(B)模型的表现均优于其他基于Optuna的模型(RMSE = 0.041,MAE = 0.026,R2 = 0.971,RAE = 0.15 = 21,CC = 0.986)和(RMSE = 0.041,MAE = 0.025,R2 = 0.969,RAE = 0.119,CC = 0.975)。因此,SO、Cl和F对预测精度起着重要作用。F-和SO表现出较高的空间自相关性,这影响了地下水水质的恶化。此外,人工神经网络(CV)模型和人工神经网络(B)模型的模型误差呈高斯分布(标准误差小,<1%),表明模型的稳定性。这些结果表明人工神经网络模型在预测沿海地区地下水水质方面的有效性,这将有助于区域水资源管理者对可持续地下水资源进行实时监测和管理。