School of Hydraulic Engineering, Dalian University of Technology, Dalian, 116024, China.
British Geological Survey, Keyworth, Nottingham, NG12 5GG, UK.
Environ Geochem Health. 2024 Oct 29;46(11):482. doi: 10.1007/s10653-024-02201-1.
Groundwater nitrate contamination poses a potential threat to human health and environmental safety globally. This study proposes an interpretable stacking ensemble learning (SEL) framework for enhancing and interpreting groundwater nitrate spatial predictions by integrating the two-level heterogeneous SEL model and SHapley Additive exPlanations (SHAP). In the SEL model, five commonly used machine learning models were utilized as base models (gradient boosting decision tree, extreme gradient boosting, random forest, extremely randomized trees, and k-nearest neighbor), whose outputs were taken as input data for the meta-model. When applied to the agricultural intensive area, the Eden Valley in the UK, the SEL model outperformed the individual models in predictive performance and generalization ability. It reveals a mean groundwater nitrate level of 2.22 mg/L-N, with 2.46% of sandstone aquifers exceeding the drinking standard of 11.3 mg/L-N. Alarmingly, 8.74% of areas with high groundwater nitrate remain outside the designated nitrate vulnerable zones. Moreover, SHAP identified that transmissivity, baseflow index, hydraulic conductivity, the percentage of arable land, and the C:N ratio in the soil were the top five key driving factors of groundwater nitrate. With nitrate threatening groundwater globally, this study presents a high-accuracy, interpretable, and flexible modeling framework that enhances our understanding of the mechanisms behind groundwater nitrate contamination. It implies that the interpretable SEL framework has great promise for providing valuable evidence for environmental management, water resource protection, and sustainable development, particularly in the data-scarce area.
地下水硝酸盐污染对全球人类健康和环境安全构成潜在威胁。本研究提出了一种可解释的堆叠集成学习(SEL)框架,通过整合两级异构 SEL 模型和 SHapley Additive exPlanations(SHAP),提高和解释地下水硝酸盐空间预测。在 SEL 模型中,使用了五个常用的机器学习模型作为基础模型(梯度提升决策树、极端梯度提升、随机森林、极度随机树和 k-最近邻),其输出作为元模型的输入数据。当应用于农业密集区英国的 Eden Valley 时,SEL 模型在预测性能和泛化能力方面优于单个模型。结果表明,地下水硝酸盐的平均水平为 2.22mg/L-N,2.46%的砂岩含水层超过了 11.3mg/L-N 的饮用水标准。令人震惊的是,8.74%的高地下水硝酸盐地区仍不在指定的硝酸盐脆弱区范围内。此外,SHAP 确定了渗透率、基流指数、水力传导率、耕地百分比和土壤中的 C:N 比是地下水硝酸盐的前五个关键驱动因素。由于硝酸盐在全球范围内威胁地下水,本研究提出了一个高精度、可解释和灵活的建模框架,增强了我们对地下水硝酸盐污染机制的理解。这意味着可解释的 SEL 框架在为环境管理、水资源保护和可持续发展提供有价值的证据方面具有很大的潜力,特别是在数据稀缺的地区。