Yuan Luyun, Wang Liyu, Gao Jiamin, Chen Xin, Wang Haoyue, Tan Wei Shan, Sun Kexiang, Gong Yabin, Deng Wanli
Department of Oncology, Putuo Hospital, Shanghai University of Traditional Chinese Medicine, No. 164, Lanxi Road, Putuo District, Shanghai, 200062, China.
Oncology Department I, Yueyang Hospital of Integrated Traditional Chinese and Western Medicine, Shanghai University of Traditional Chinese Medicine, No. 110, Ganhe Road, Hongkou District, Shanghai, 200080, China.
J Transl Med. 2025 Jun 22;23(1):695. doi: 10.1186/s12967-025-06663-4.
Although CRC incidence is declining overall, early-onset colorectal cancers are increasing. No prognostic models currently exist for predicting postoperative survival in Stage I-III early-onset colon or rectal cancer. Such tools are urgently needed to enable individualized risk assessment.
We identified patients with early onset (EO) and late-onset (LO) colon or rectal cancer from the SEER database and randomly split them into training and test cohorts (7:3). External cohorts of early-onset colon and rectal cancer were collected from two Chinese hospitals. After LASSO-Cox feature selection, six models-RSF, LASSO-Cox, S-SVM, XGBSE, GBSA, and DeepSurv-were developed to predict cancer-specific survival (CSS). Performance was assessed using the C-index, Brier score, time-dependent AUC, calibration, and decision curves. SHAP was used for model interpretation. A risk stratification system and an online calculator were constructed based on the best-performing model.
A total of 3,997 EO colon cancer, 2,016 EO rectal cancer, 30,621 LO colon cancer, and 8,667 LO rectal cancer patients from SEER, along with 205 EO colon cancer and 153 EO rectal cancer patients from Chinese institutions, were included in the study. Based on comprehensive evaluation across multiple datasets and metrics, the RSF model demonstrated the best and most stable performance, outperforming not only other machine learning models but also the traditional TNM staging system. In EO colon cancer, the RSF model achieved C-indices of 0.738 (test cohort) and 0.829 (external validation), mean AUCs of 0.765 and 0.889, and integrated Brier scores of 0.084 and 0.077, respectively. For EO rectal cancer, C-indices were 0.728 and 0.722, mean AUCs were 0.753 and 0.900, and integrated Brier scores were 0.106 and 0.095, respectively. The calibration and decision curves further confirmed the RSF model's good calibration and clinical net benefit. The RSF model also showed robust performance in LOCRC cohorts. SHAP analysis was used to quantify the marginal contribution of each predictor within each cancer subtype. Based on the RSF model, we developed a CSS-based risk stratification framework and deployed an online prediction tool.
In summary, we selected the RSF model for its outstanding predictive performance, naming it OncoE25, to support personalized health management for EO colon and rectal patients.
尽管结直肠癌(CRC)的总体发病率在下降,但早发性结直肠癌的发病率却在上升。目前尚无用于预测Ⅰ-Ⅲ期早发性结肠癌或直肠癌术后生存率的预后模型。迫切需要这样的工具来进行个体化风险评估。
我们从监测、流行病学和最终结果(SEER)数据库中识别出早发性(EO)和晚发性(LO)结肠癌或直肠癌患者,并将他们随机分为训练队列和测试队列(7:3)。从两家中国医院收集早发性结肠癌和直肠癌的外部队列。经过套索-考克斯(LASSO-Cox)特征选择后,开发了六个模型——随机生存森林(RSF)、LASSO-考克斯、支持向量机(S-SVM)、极限梯度提升生存估计(XGBSE)、梯度提升生存分析(GBSA)和深度生存模型(DeepSurv)——来预测癌症特异性生存(CSS)。使用一致性指数(C指数)、布里尔评分、时间依赖性曲线下面积(AUC)、校准和决策曲线来评估模型性能。使用SHAP值进行模型解释。基于表现最佳的模型构建了一个风险分层系统和一个在线计算器。
该研究纳入了来自SEER的3997例早发性结肠癌、2016例早发性直肠癌、30621例晚发性结肠癌和8667例晚发性直肠癌患者,以及来自中国机构的205例早发性结肠癌和153例早发性直肠癌患者。基于对多个数据集和指标的综合评估,RSF模型表现出最佳且最稳定的性能,不仅优于其他机器学习模型,也优于传统的TNM分期系统。在早发性结肠癌中,RSF模型在测试队列中的C指数为0.738,在外部验证中的C指数为0.829,平均AUC分别为0.765和0.889,综合布里尔评分分别为0.084和0.077。对于早发性直肠癌,C指数分别为0.728和0.722,平均AUC分别为0.753和0.900,综合布里尔评分分别为0.106和0.095。校准和决策曲线进一步证实了RSF模型良好的校准效果和临床净效益。RSF模型在晚发性结直肠癌队列中也表现出稳健的性能。SHAP分析用于量化每个癌症亚型中每个预测因子的边际贡献。基于RSF模型,我们开发了一个基于CSS的风险分层框架并部署了一个在线预测工具。
总之,我们选择了具有出色预测性能的RSF模型,并将其命名为OncoE25,以支持早发性结肠癌和直肠癌患者的个性化健康管理。