Chen Yutao, Li Ning, Xing Fucheng, Xiang Han, Chen Zilong
College of Emergency Management, Xihua University, Chengdu, 610039, China.
Sichuan Huadi Construction Engineering Co, Chengdu, 610039, China.
Sci Rep. 2025 Jul 2;15(1):22480. doi: 10.1038/s41598-025-03479-6.
Debris flow is a type of non-homogeneous fluid with a high concentration that is created by melting snow and ice and heavy rainfall. Its formation and movement are intricate processes. The investigation of debris flow susceptibility assessment is crucial for disaster warning and mitigation. Since it is challenging to predict debris flows with precision using traditional methods, machine learning algorithms have been used more and more in this field in recent years. In this paper, a debris flow susceptibility assessment model is constructed based on RF (Random Forest) and XGBoost (Extreme Gradient Boosting) models with Stacking ensmble learning method, and SPY technique is introduced to optimize the negative sample selection. The outcomes demonstrate that the SPY-RF model with AUC value of 0.93 outperforms the original RF model, which had an AUC value of 0.82, by a significant margin, and performs well in all risk levels, particularly in the very high susceptibility zone with the highest debris flow density. Furthermore, the SPY-XGBoost model's AUC value of 0.87 is superior to the original XGBoost model's 0.72. This suggests that the SPY technique is able to improve the prediction accuracy and reliability of the model, especially effective in reducing the misclassification of non-prone areas. On the other hand, the high correlation of base-learner features prevented the Stacking-RF and Stacking-XGBoost models from improving the prediction performance any further, with AUC values of 0.80 and 0.71, respectively. The results of the factor contribution analysis indicate that the main factors influencing the susceptibility of debris flow are SPI, rainfall, curvature, and area. Of these, SPI contributes the most, indicating the critical role that water flow intensity plays in the formation of debris flow. This paper presents a study that demonstrates the benefits of integrating SPY technology with ensemble learning. Additionally, it investigates the shortcomings of the Stacking model in debris flow prediction, offering a valuable avenue for future research on optimizing model diversity and enhancing prediction performance.
泥石流是一种由冰雪融化和暴雨形成的高浓度非均质流体。其形成和运动过程复杂。泥石流易发性评估研究对于灾害预警和减灾至关重要。由于传统方法精确预测泥石流具有挑战性,近年来机器学习算法在该领域的应用越来越多。本文基于随机森林(RF)和极端梯度提升(XGBoost)模型,采用堆叠集成学习方法构建了泥石流易发性评估模型,并引入SPY技术优化负样本选择。结果表明,AUC值为0.93的SPY - RF模型比AUC值为0.82的原始RF模型有显著优势,在所有风险等级中表现良好,尤其在泥石流密度最高的极高易发性区域。此外,SPY - XGBoost模型的AUC值为0.87,优于原始XGBoost模型的0.72。这表明SPY技术能够提高模型的预测准确性和可靠性,尤其在减少非易发性区域的误分类方面效果显著。另一方面,基学习器特征的高相关性使得堆叠 - RF和堆叠 - XGBoost模型无法进一步提高预测性能,其AUC值分别为0.80和0.71。因子贡献分析结果表明,影响泥石流易发性的主要因素是标准化降水指数(SPI)、降雨量、曲率和面积。其中,SPI贡献最大,表明水流强度在泥石流形成中起关键作用。本文的研究展示了将SPY技术与集成学习相结合的优势。此外,还研究了堆叠模型在泥石流预测中的不足,为未来优化模型多样性和提高预测性能的研究提供了有价值的途径。