基于SHAP增强堆叠集成学习的干旱城市地区洪水易发性制图：以吉达为例

Flood susceptibility mapping in arid urban areas using SHAP-enhanced stacked ensemble learning: A case study of Jeddah.

作者信息

Zerouali Bilel, Almaliki Abdulrazak H, Santos Celso Augusto Guimarães

机构信息

Laboratory of Architecture, Cities and Environment, Faculty of Civil Engineering and Architecture, Department of Hydraulic, Hassiba Benbouali University of Chlef, B.P. 78C, Ouled Fares, 02180, Algeria.

Department of Civil Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif, 21944, Saudi Arabia.

出版信息

J Environ Manage. 2025 Aug 28;393:127128. doi: 10.1016/j.jenvman.2025.127128.

DOI:10.1016/j.jenvman.2025.127128

PMID:40882276

Abstract

Flooding is an escalating hazard in arid and rapidly urbanizing environments such as Jeddah, Saudi Arabia, where the lack of historical flood records and sparse monitoring systems challenge effective risk prediction. To address this gap, this study aims to develop an accurate and interpretable flood susceptibility-mapping framework tailored to data-scarce urban settings. The research integrates a stacked ensemble model-comprising machine learning: XGBoost, CatBoost, and Histogram-based Gradient Boosting (HGB)-with SHapley Additive exPlanations (SHAP) to enhance prediction accuracy and model transparency. Random Forest was excluded from the final model stack due to inferior classification performance. A diverse set of geospatial inputs, including digital elevation model, slope, flow direction, Curve Number, topographic indices, and LULC (from ESRI Sentinel-2) were used as predictors. Furthermore, 92 and 198 flooded and non-flooded points were used for model validation. The model achieved strong predictive performance (AUC = 0.92, Accuracy = 0.82) on the validation set. In the absence of official flood records, model outputs were intersected with road network data to identify 395 road points in highly susceptible zones. Although these points do not represent a formal validation dataset-due to the general lack of detailed flood event records in the region, particularly in relation to infrastructure-they provide a valuable proxy for identifying flood-prone road segments. SHAP explainability analysis revealed that TRI, TPI, and distance to rivers were the most globally influential features, while Curve Number and LULC were key drivers of high-risk predictions. The model mapped 139 km (8.7 %) of the area as very high flood susceptibility and 325 km (20.3 %) as high susceptibility, outperforming individual learners. These results confirm that stacked ensemble learning, paired with explainable AI and creative validation strategies, can produce reliable flood susceptibility maps even in data-constrained contexts. This framework offers a transferable and scalable solution for flood risk assessment in similar arid and urbanizing environments.

摘要

在沙特阿拉伯吉达等干旱且快速城市化的环境中，洪水灾害日益严重，当地缺乏历史洪水记录且监测系统稀疏，这对有效的风险预测构成了挑战。为了填补这一空白，本研究旨在开发一个针对数据稀缺的城市环境量身定制的准确且可解释的洪水易发性映射框架。该研究将一个堆叠集成模型（包括机器学习算法：XGBoost、CatBoost和基于直方图的梯度提升（HGB））与SHapley加法解释（SHAP）相结合，以提高预测准确性和模型透明度。由于分类性能较差，随机森林被排除在最终模型堆栈之外。一系列不同的地理空间输入数据，包括数字高程模型、坡度、水流方向、曲线数、地形指数和土地利用/土地覆盖（来自ESRI哨兵 - 2）被用作预测因子。此外，92个和198个洪水淹没点及非洪水淹没点用于模型验证。该模型在验证集上取得了强大的预测性能（AUC = 0.92，准确率 = 0.82）。在缺乏官方洪水记录的情况下，模型输出与道路网络数据相交，以识别高易发性区域中的395个道路点。尽管由于该地区普遍缺乏详细的洪水事件记录，特别是与基础设施相关的记录，这些点并不代表正式的验证数据集，但它们为识别易发生洪水的道路段提供了有价值的替代方法。SHAP可解释性分析表明，地形粗糙度指数（TRI）、地形位置指数（TPI）和到河流的距离是最具全局影响力的特征，而曲线数和土地利用/土地覆盖是高风险预测的关键驱动因素。该模型将该地区139公里（8.7%）的区域映射为极高洪水易发性区域，325公里（20.3%）的区域映射为高易发性区域，优于单个学习器。这些结果证实，堆叠集成学习与可解释人工智能和创新验证策略相结合，即使在数据受限的情况下也能生成可靠的洪水易发性地图。该框架为类似干旱和城市化环境中的洪水风险评估提供了可转移且可扩展的解决方案。