Lu Cong, He Ying, Chen Chun-Ru, Wu Lun, Song Dan, Wang Chen-Hong, Zhang Le-Qing, Miao Jing-Yi, Zheng Yong-Bin, Wang Wei
Department of Gastrointestinal Surgery, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei Province, China.
Department of Stomatology, Renmin Hospital of Wuhan University, Wuhan, 430060, China.
Discov Oncol. 2025 Jun 16;16(1):1120. doi: 10.1007/s12672-025-02894-5.
Primary liver cancer is the sixth most common cancer globally and ranks third in cancer-related mortality. Patients with distant metastasis (PLCDM) have particularly low survival rates and are more difficult to treat. This study aims to identify risk factors associated with distant metastasis and overall survival (OS) in primary liver cancer and to determine the optimal predictive models using machine learning.
We extracted data from the SEER database (Incidence-SEER Research Data, 17 Registries, Nov 2022 Sub (2000-2020)) and identified risk factors for distant metastasis using logistic regression. Eight machine learning models were constructed using the "tidymodels" package in R and evaluated based on ROC curves, AUC, and accuracy. Cox regression was used to identify risk factors for OS, and Cox and Random Survival Forest (RSF) models were compared using time-dependent ROC curves. The best-performing model was interpreted using Shapley analysis. We also developed user-friendly web applications using the "shiny" package in R for clinical use.
Multivariate analysis identified grade, T stage, N stage, tumor size, and surgery as independent risk factors for PLCDM. The Random Forest (RF) model showed the best performance with AUC values of 0.836, 0.817, and 0.846 in the training, internal validation, and external validation cohorts, respectively, and favorable Brier scores and accuracy. Shapley analysis ranked the risk factors by contribution as surgery, T stage, tumor size, N stage, and grade. Cox regression identified grade, surgery, and T stage as independent prognostic factors for OS. The Cox model outperformed the RSF model in time-dependent ROC analysis. Calibration and decision curve analysis (DCA) further confirmed its strong predictive performance and clinical utility. Shapley analysis ranked the risk factors as grade, surgery, and T stage.
We successfully constructed and validated optimal models for predicting PLCDM and its prognosis. These models provide valuable tools to guide clinical decision-making for PLCDM.
原发性肝癌是全球第六大常见癌症,在癌症相关死亡率中排名第三。发生远处转移的原发性肝癌患者(PLCDM)生存率特别低,治疗难度更大。本研究旨在确定原发性肝癌远处转移和总生存期(OS)的相关危险因素,并使用机器学习确定最佳预测模型。
我们从SEER数据库(发病率-SEER研究数据,17个登记处,2022年11月更新版(2000 - 2020年))中提取数据,并使用逻辑回归确定远处转移的危险因素。在R语言中使用“tidymodels”包构建了八个机器学习模型,并基于ROC曲线、AUC和准确性进行评估。使用Cox回归确定OS的危险因素,并使用时间依赖性ROC曲线比较Cox模型和随机生存森林(RSF)模型。使用Shapley分析对表现最佳的模型进行解释。我们还使用R语言中的“shiny”包开发了便于临床使用的用户友好型网络应用程序。
多变量分析确定分级、T分期、N分期、肿瘤大小和手术是PLCDM的独立危险因素。随机森林(RF)模型表现最佳,在训练队列、内部验证队列和外部验证队列中的AUC值分别为0.836、0.817和0.846,且具有良好的Brier评分和准确性。Shapley分析按贡献程度对危险因素进行排序,依次为手术、T分期、肿瘤大小、N分期和分级。Cox回归确定分级、手术和T分期是OS的独立预后因素。在时间依赖性ROC分析中,Cox模型优于RSF模型。校准和决策曲线分析(DCA)进一步证实了其强大的预测性能和临床实用性。Shapley分析将危险因素排序为分级、手术和T分期。
我们成功构建并验证了预测PLCDM及其预后的最佳模型。这些模型为指导PLCDM的临床决策提供了有价值的工具。