Zhang Hongyu, Zhang Li, Li Na, Zhang Yongsheng, Zhang Xiaowen, Wang Dawei
The First Clinical Medical College, Shandong University of Traditional Chinese Medicine, Jinan, China.
Health Management Center, Institute of Health Management, Shandong Engineering Laboratory of Health Management, the First Affiliated Hospital of Shandong First Medical University & Shandong Provincial Qianfoshan Hospital, Jinan, China.
BMC Gastroenterol. 2025 Jul 15;25(1):518. doi: 10.1186/s12876-025-04120-6.
This study aimed to develop an accurate prediction model for the risk of Non-alcoholic fatty liver disease (NAFLD) using the random survival forests (RSF), and to investigate the distribution of NAFLD risk with time.
This retrospective cohort study included subjects who had annual health checkups from 1 January 2021 to 31 December 2024. The hold-out strategy, that all the subjects were divided into a training set and a test set, was employed to develop and evaluate our models. Important predictors were then extracted from all the candidate variables using the LASSO regression on the training set. Two prediction models were constructed using the Cox model and the RSF model. Feature importance and their 95% CIs were calculated using the VIMP with bootstrap resampling. The integrated area under the curve (iAUC), the time-dependent area under the curve (tAUC), the integrated Brier score (iBS), and the time-dependent prediction error (PE) were used to evaluate the discrimination and calibration of our models.
A total of 18,250 patients fulfilled the criteria, and 14 predictors were extracted through the LASSO regression for the next model development. The RSF model showed exceptional discrimination (iAUC of 0.856) and calibration (iBS of 0.116) compared to the Cox model (iAUC of 0.759 and iBS of 0.148). Based on the RSF model predictions, subjects were stratified into the high- and low-risk groups with significant differences, with a mean NAFLD-free time of 20.86 and 36.76 months (P <.0001), respectively.
In this study, the RSF prediction model for the risk of NAFLD was developed, which outperformed the traditional Cox model, achieved remarkable risk stratification for NAFLD, and provided novel insights into the distribution of NAFLD risk with time.
本研究旨在使用随机生存森林(RSF)开发一种准确的非酒精性脂肪性肝病(NAFLD)风险预测模型,并研究NAFLD风险随时间的分布情况。
这项回顾性队列研究纳入了2021年1月1日至2024年12月31日期间进行年度健康检查的受试者。采用留出法策略,即将所有受试者分为训练集和测试集,用于开发和评估我们的模型。然后在训练集上使用LASSO回归从所有候选变量中提取重要预测因子。使用Cox模型和RSF模型构建了两个预测模型。使用带有自助重采样的VIMP计算特征重要性及其95%置信区间。使用综合曲线下面积(iAUC)、时间依赖性曲线下面积(tAUC)、综合Brier评分(iBS)和时间依赖性预测误差(PE)来评估我们模型的区分度和校准度。
共有18250名患者符合标准,通过LASSO回归提取了14个预测因子用于后续模型开发。与Cox模型(iAUC为0.759,iBS为0.148)相比,RSF模型表现出卓越的区分度(iAUC为0.856)和校准度(iBS为0.116)。基于RSF模型的预测,受试者被分为高风险组和低风险组,差异显著,无NAFLD的平均时间分别为20.86个月和36.76个月(P <.0001)。
在本研究中,开发了NAFLD风险的RSF预测模型该模型优于传统的Cox模型,实现了显著的NAFLD风险分层,并为NAFLD风险随时间的分布提供了新的见解。