Faculty of Geography, Yunnan Normal University, Kunming, Yunnan Province, China.
Badong National Observation and Research Station of Geohazards, China University of Geosciences (Wuhan), Wuhan, Hubei Province, China.
PLoS One. 2023 Oct 12;18(10):e0292897. doi: 10.1371/journal.pone.0292897. eCollection 2023.
The number of input factors affects the prediction accuracy of a model. Factor screening plays an important role as the starting point for data input. The aim of this study is to explore the influence of different factor screening methods on the prediction results. Taking the 2014 landslide inventory of Jingdong County as an example, a landslide database was constructed based on 136 landslide events and 11 selected factors, which were randomly divided into a training dataset and a test dataset according to a ratio of 7:3. Four factor screening methods, namely, the information gain ratio (IGR), GeoDetector, Pearson correlation coefficient and multicollinearity test (MT), were selected to screen the factors. A random forest (RF) model was then used in combination with each factor set for landslide susceptibility mapping (LSM). Finally, accuracy validation was performed using confusion matrices and ROC curves. The results show that factor screening is beneficial in improving the accuracy of the resulting model compared to the original model. Second, the IGR_RF model had the highest AUC value (0.9334), which was higher than that of the MT_RF model without factor screening (0.9194), and the IGR_RF model predicted the most landslides in the very high susceptibility zone (51.22%), indicating the good prediction performance of the IGR_RF model. Finally, the factor weighting analysis revealed that NDVI, elevation and aspect had the greatest influence on landslides in Jingdong County and that curvature had the least influence on landslides. This study can provide a reference for factor screening in LSM.
输入因素的数量会影响模型的预测准确性。因子筛选在作为数据输入的起点方面起着重要作用。本研究旨在探讨不同因子筛选方法对预测结果的影响。以京东县 2014 年滑坡目录为例,基于 136 个滑坡事件和 11 个选定因子构建了一个滑坡数据库,根据 7:3 的比例将其随机分为训练数据集和测试数据集。选择信息增益比(IGR)、地质探测器(GeoDetector)、皮尔逊相关系数和多重共线性检验(MT)四种因子筛选方法对因子进行筛选。然后,使用随机森林(RF)模型结合每个因子集进行滑坡易发性制图(LSM)。最后,使用混淆矩阵和 ROC 曲线进行准确性验证。结果表明,与原始模型相比,因子筛选有助于提高模型的准确性。其次,IGR_RF 模型具有最高的 AUC 值(0.9334),高于没有因子筛选的 MT_RF 模型(0.9194),IGR_RF 模型预测的非常高易发性区域中的滑坡数量最多(51.22%),表明 IGR_RF 模型具有良好的预测性能。最后,因子加权分析表明,NDVI、海拔和方位对京东县的滑坡影响最大,曲率对滑坡的影响最小。本研究可为 LSM 中的因子筛选提供参考。