Liu Rongqiang, Wu Shinan, Kuang Tianrui, Gao Xiaofeng, Wang Jianguo, Ye Jing, Liu Wenxuan, Shi Qiao, Zhao Kailiang, Yu Jia, Wang Weixing
Department of Hepatobiliary Surgery, Renmin Hospital of Wuhan University, Wuhan, Hubei Province, China.
School of Medicine, Xiamen University, Eye Institute of Xiamen University, Xiamen, Fujian, China.
Int J Surg. 2025 Jul 17. doi: 10.1097/JS9.0000000000002901.
The prognosis for gallbladder cancer (GBC) patients is generally poor due to the early occurrence of distant metastasis (DM). However, research on predicting the risk of DM in GBC patients is still limited. Therefore, this study aimed to apply the Surveillance, Epidemiology, and End Results (SEER) database with machine learning (ML) methods to construct a novel model for predicting the risk of DM in GBC patients.
The data of GBC patients from the SEER database (2000-2020) were divided into a training set and an internal test set in a 7:3 ratio. Univariate and multivariate logistic regression analyses were then applied to systematically assess the risk factors for DM development. Six ML techniques were subsequently applied to construct a predictive model based on feature selection, validated by ten-fold cross-validation on the training set. Shapley additive interpretation (SHAP) was used to explain the selected models. Additionally, based on the optimal machine learning model, an online calculator was developed to provide personalized DM risk assessment for GBC patients.
Seven key variables were incorporated into the developed machine learning model for analysis. The Extreme Gradient Boosting (XGB) model demonstrated high predictive accuracy [Precision = 0.968; Area Under the Curve (AUC) = 0.885]. In the assessment of risk factors, T, N, grade and age were identified as risk factors for DM in GBC patients. Conversely, rural urban continuum, marital and median household income inflation adj to 2021 were identified as protective factors. Finally, an optimal learning model-based web calculator for personalized DM risk assessment was successfully constructed.
The XGB model was the most effective for predicting DM in GBC patients. This model can assist in developing personalized treatment plans for patients at an early stage, thereby improving prognosis.
由于胆囊癌(GBC)患者早期易发生远处转移(DM),其预后通常较差。然而,关于预测GBC患者DM风险的研究仍然有限。因此,本研究旨在应用监测、流行病学和最终结果(SEER)数据库及机器学习(ML)方法,构建一种预测GBC患者DM风险的新模型。
将SEER数据库(2000 - 2020年)中GBC患者的数据按7:3的比例分为训练集和内部测试集。然后应用单因素和多因素逻辑回归分析系统评估DM发生的危险因素。随后应用六种ML技术基于特征选择构建预测模型,并在训练集上通过十折交叉验证进行验证。使用Shapley加法解释(SHAP)来解释所选模型。此外,基于最优机器学习模型开发了一个在线计算器,为GBC患者提供个性化的DM风险评估。
七个关键变量被纳入所开发的机器学习模型进行分析。极端梯度提升(XGB)模型显示出较高的预测准确性[精确率 = 0.968;曲线下面积(AUC)= 0.885]。在危险因素评估中,T、N、分级和年龄被确定为GBC患者DM的危险因素。相反,城乡连续体、婚姻状况和经通货膨胀调整至2021年的家庭收入中位数被确定为保护因素。最后,成功构建了基于最优学习模型的个性化DM风险评估网络计算器。
XGB模型在预测GBC患者的DM方面最有效。该模型可协助在早期为患者制定个性化治疗方案,从而改善预后。