Department of Environmental Engineering, Faculty of Water and Environmental Engineering, Shahid Chamran University of Ahvaz, Ahvaz, Iran.
Division of Water Resources Engineering & Center for Advanced Middle Eastern Studies, Lund University, Lund, Sweden.
Environ Sci Pollut Res Int. 2024 Jun;31(29):42088-42110. doi: 10.1007/s11356-024-33920-8. Epub 2024 Jun 11.
The temporal aspect of groundwater vulnerability to contaminants such as nitrate is often overlooked, assuming vulnerability has a static nature. This study bridges this gap by employing machine learning with Detecting Breakpoints and Estimating Segments in Trend (DBEST) algorithm to reveal the underlying relationship between nitrate, water table, vegetation cover, and precipitation time series, that are related to agricultural activities and groundwater demand in a semi-arid region. The contamination probability of Lenjanat Plain has been mapped by comparing random forest (RF), support vector machine (SVM), and K-nearest-neighbors (KNN) models, fed with 32 input variables (dem-derived factors, physiography, distance and density maps, time series data). Also, imbalanced learning and feature selection techniques were investigated as supplementary methods, adding up to four scenarios. Results showed that the RF model, integrated with forward sequential feature selection (SFS) and SMOTE-Tomek resampling method, outperformed the other models (F-score: 0.94, MCC: 0.83). The SFS techniques outperformed other feature selection methods in enhancing the accuracy of the models with the cost of computational expenses, and the cost-sensitive function proved more efficient in tackling imbalanced data issues than the other investigated methods. The DBEST method identified significant breakpoints within each time series dataset, revealing a clear association between agricultural practices along the Zayandehrood River and substantial nitrate contamination within the Lenjanat region. Additionally, the groundwater vulnerability maps created using the candid RF model and an ensemble of the best RF, SVM, and KNN models predicted mid to high levels of vulnerability in the central parts and the downhills in the southwest.
地下水对污染物(如硝酸盐)脆弱性的时间方面经常被忽视,假设脆弱性具有静态性质。本研究通过使用机器学习与检测断点和估计趋势段(DBEST)算法来弥补这一差距,揭示硝酸盐、地下水位、植被覆盖和降水时间序列之间的潜在关系,这些关系与半干旱地区的农业活动和地下水需求有关。通过比较随机森林(RF)、支持向量机(SVM)和 K 最近邻(KNN)模型,对 Lenjanat 平原的污染概率进行了映射,这些模型输入了 32 个输入变量(源自 DEM 的因素、地形、距离和密度图、时间序列数据)。此外,还研究了不平衡学习和特征选择技术作为补充方法,共提出了四种方案。结果表明,RF 模型与前向逐步特征选择(SFS)和 SMOTE-Tomek 重采样方法相结合,优于其他模型(F 分数:0.94,MCC:0.83)。SFS 技术在提高模型准确性方面优于其他特征选择方法,但代价是计算费用增加,而成本敏感函数在解决不平衡数据问题方面比其他研究方法更有效。DBEST 方法在每个时间序列数据集内识别出显著的断点,揭示了 Zayandehrood 河沿岸的农业实践与 Lenjanat 地区大量硝酸盐污染之间的明显关联。此外,使用原始 RF 模型和最佳 RF、SVM 和 KNN 模型的组合创建的地下水脆弱性图预测了中心地带和西南部下坡地带的中等到高度脆弱性。