Sahour Soheil, Khanbeyki Matin, Gholami Vahid, Sahour Hossein, Kahvazade Irene, Karimi Hadi
Rouzbahan Institute of Higher Education, Sari, Iran.
Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran.
Environ Sci Pollut Res Int. 2023 Apr;30(16):46004-46021. doi: 10.1007/s11356-023-25596-3. Epub 2023 Jan 30.
Groundwater quality is typically measured through water sampling and lab analysis. The field-based measurements are costly and time-consuming when applied over a large domain. In this study, we developed a machine learning-based framework to map groundwater quality in an unconfined aquifer in the north of Iran. Groundwater samples were provided from 248 monitoring wells across the region. The groundwater quality index (GWQI) in each well was measured and classified into four classes: very poor, poor, good, and excellent, according to their cut-off values. Factors affecting groundwater quality, including distance to industrial centers, distance to residential areas, population density, aquifer transmissivity, precipitation, evaporation, geology, and elevation, were identified and prepared in the GIS environment. Six machine learning classifiers, including extreme gradient boosting (XGB), random forest (RF), support vector machine (SVM), artificial neural networks (ANN), k-nearest neighbor (KNN), and Gaussian classifier model (GCM), were used to establish relationships between GWQI and its controlling factors. The algorithms were evaluated using the receiver operating characteristic curve (ROC) and statistical efficiencies (overall accuracy, precision, recall, and F-1 score). Accuracy assessment showed that ML algorithms provided high accuracy in predicting groundwater quality. However, RF was selected as the optimum model given its higher accuracy (overall accuracy, precision, and recall = 0.92; ROC = 0.95). The trained RF model was used to map GWQI classes across the entire region. Results showed that the poor GWQI class is dominant in the study area (covering 66% of the study area), followed by good (19% of the area), very poor (14% of the area), and excellent (< 1% of the area) classes. An area of very poor GWQI was observed in the north. Feature analysis indicated that the distance to industrial locations is the main factor affecting groundwater quality in the region. The study provides a cost-effective methodology in groundwater quality modeling that can be duplicated in other regions with similar hydrological and geological settings.
地下水质量通常通过水样采集和实验室分析来测量。当应用于大面积区域时,基于实地的测量成本高昂且耗时。在本研究中,我们开发了一个基于机器学习的框架来绘制伊朗北部无压含水层的地下水质量图。从该地区的248个监测井采集了地下水样本。测量了每个井的地下水质量指数(GWQI),并根据其临界值分为四类:极差、差、良好和优秀。确定了影响地下水质量的因素,包括到工业中心的距离、到居民区的距离、人口密度、含水层导水率、降水量、蒸发量、地质和海拔,并在地理信息系统(GIS)环境中进行了整理。使用六种机器学习分类器,包括极端梯度提升(XGB)、随机森林(RF)、支持向量机(SVM)、人工神经网络(ANN)、k近邻(KNN)和高斯分类器模型(GCM),来建立GWQI与其控制因素之间的关系。使用接收器操作特征曲线(ROC)和统计效率(总体准确率、精确率、召回率和F-1分数)对算法进行了评估。准确性评估表明,机器学习算法在预测地下水质量方面具有较高的准确性。然而,由于随机森林具有更高的准确性(总体准确率、精确率和召回率 = 0.92;ROC = 0.95),因此被选为最优模型。使用训练好的随机森林模型绘制了整个区域的GWQI类别图。结果表明,在研究区域中,差的GWQI类别占主导地位(覆盖研究区域的66%),其次是良好(占区域的19%)、极差(占区域的14%)和优秀(占区域的<1%)类别。在北部观察到了一个GWQI极差的区域。特征分析表明,到工业地点的距离是影响该地区地下水质量的主要因素。该研究提供了一种具有成本效益的地下水质量建模方法,可在其他具有相似水文和地质环境的地区复制。