Han Jeongho, Guzman Jorge A, Chu Maria L
Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA; Agriculture and Life Sciences Research Institute, Kangwon National University, Chuncheon, 24341, Republic of Korea.
Department of Agricultural and Biological Engineering, The GRAINGER College of Engineering, College of Agricultural, Consumer & Environmental Sciences, ACES, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
J Environ Manage. 2025 May;383:125478. doi: 10.1016/j.jenvman.2025.125478. Epub 2025 Apr 25.
This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the "very high" class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.
本研究开发了一种新颖的可解释堆叠集成模型,该模型将堆叠泛化集成方法与SHapley加性解释(SHAP)相结合,以增强沟壑侵蚀敏感性的预测和解释能力。应用于伊利诺伊州杰斐逊县时,我们的方法利用随机森林(RF)、梯度提升机(GBM)、逻辑回归(LR)和深度神经网络(DNN)作为不同配置下的基学习器和元学习器,从而产生了44种不同的堆叠模型。对比分析表明,基于激光雷达差异观测在200个随机沟壑点进行评估时,堆叠模型具有卓越的预测性能;除三个模型外,所有模型的曲线下面积(AUC)值均超过了表现最佳的基模型(GBM)所达到的最高值0.86。以RF和GBM作为基模型、LR作为元学习器的LR堆叠模型最为有效,其AUC为0.916。LR堆叠模型生成的沟壑侵蚀敏感性地图将33%的农业用地(89208公顷)归类为“非常高”等级,而单个RF、LR、GBM和DNN模型的预测比例分别为27%、87%、27%和55%。至关重要的是,SHAP分析阐明了特征值的变化如何影响模型行为,同时考虑了基模型和元学习器中的特征交互作用。SHAP确定年叶面积指数(LAI)是RF和GBM基模型中最具影响力的特征。此外,它还突出了GBM模型在堆叠模型最终决策过程中相对于RF基模型的重要性。通过提供一个透明的机制来评估不同特征和模型如何对最终决策做出贡献,这种方法可以扩展到更广泛的环境管理和政策制定背景中,促进更明智和负责任的资源分配。