Centre de Recherche Scientifique et Technique sur les Régions Arides, CRSTRA, Biskra, Algeria.
Department of Agricultural Sciences, University of Mohammed Khider, Biskra, Algeria.
Environ Sci Pollut Res Int. 2024 Aug;31(36):48955-48971. doi: 10.1007/s11356-024-34440-1. Epub 2024 Jul 23.
The groundwater salinization process complexity and the lack of data on its controlling factors are the main challenges for accurate predictions and mapping of aquifer salinity. For this purpose, effective machine learning (ML) methodologies are employed for effective modeling and mapping of groundwater salinity (GWS) in the Mio-Pliocene aquifer in the Sidi Okba region, Algeria, based on limited dataset of electrical conductivity (EC) measurements and readily available digital elevation model (DEM) derivatives. The dataset was randomly split into training (70%) and testing (30%) sets, and three wrapper selection methods, recursive feature elimination (RFE), forward feature selection (FFS), and backward feature selection (BFS) are applied to train the data. The resulting combinations are used as inputs for five ML models, namely random forest (RF), hybrid neuro-fuzzy inference system (HyFIS), K-nearest neighbors (KNN), cubist regression model (CRM), and support vector machine (SVM). The best-performing model is identified and applied to predict and map GWS across the entire study area. It is highlighted that the applied methods yield input variation combinations as critical factors that are often overlocked by many researchers, which substantially impacts the models' accuracy. Among different alternatives the RF model emerged as the most effective for predicting and mapping GWS in the study area, which led to the high performance in both the training (RMSE = 1.016, R = 0.854, and MAE = 0.759) and testing (RMSE = 1.069, R = 0.831, and MAE = 0.921) phases. The generated digital map highlighted the alarming situation regarding excessive GWS levels in the study area, particularly in zones of low elevations and far from the Foum Elgherza dam and Elbiraz wadi. Overall, this study represents a significant advancement over previous approaches, offering enhanced predictive performance for GWS with the minimum number of input variables.
地下水盐化过程的复杂性以及控制因素数据的缺乏是准确预测和绘制含水层盐度图的主要挑战。为此,本研究采用有效的机器学习 (ML) 方法,基于阿尔及利亚西迪·奥克巴地区中上新统含水层有限的电导率 (EC) 测量数据集和现成的数字高程模型 (DEM) 衍生数据,对地下水盐度 (GWS) 进行有效建模和制图。该数据集被随机分为训练集 (70%) 和测试集 (30%),并应用三种包装选择方法,即递归特征消除 (RFE)、前向特征选择 (FFS) 和后向特征选择 (BFS) 来训练数据。所得组合被用作五个 ML 模型的输入,即随机森林 (RF)、混合神经模糊推理系统 (HyFIS)、K 最近邻 (KNN)、立方体回归模型 (CRM) 和支持向量机 (SVM)。确定表现最佳的模型并应用于整个研究区域预测和绘制 GWS。值得注意的是,所应用的方法产生了输入变化组合,这些组合是许多研究人员经常忽略的关键因素,这极大地影响了模型的准确性。在不同的替代方案中,RF 模型被确定为预测和绘制研究区 GWS 的最有效模型,这导致在训练 (RMSE=1.016,R=0.854,MAE=0.759) 和测试 (RMSE=1.069,R=0.831,MAE=0.921) 阶段都具有较高的性能。生成的数字地图突出了研究区域内 GWS 水平过高的令人担忧的情况,特别是在地势较低和远离富姆·埃尔盖尔扎大坝和埃尔比拉兹河谷的区域。总的来说,与以前的方法相比,本研究是一个重大进展,它提供了增强的 GWS 预测性能,同时使用了最少数量的输入变量。