Suppr超能文献

一种用于预测肝内胆管癌远处转移风险的可解释机器学习模型:一项基于人群的队列研究。

An explainable machine learning model for predicting the risk of distant metastasis in intrahepatic cholangiocarcinoma: a population-based cohort study.

作者信息

Bi Jinzhe, Yu Yaqun

机构信息

Department of Hepatobiliary and Pancreatic Surgery, Affiliated Hospital of Guilin Medical University, Guilin, 541001, China.

出版信息

Discov Oncol. 2025 Jun 18;16(1):1140. doi: 10.1007/s12672-025-02952-y.

Abstract

BACKGROUND

Distant metastasis (DM) in intrahepatic cholangiocarcinoma (ICC) is associated with poor prognosis and significantly high mortality. Therefore, developing an effective early prediction method for DM risk is crucial for tailoring personalized treatment plans and improving patient outcomes.

METHODS

This study included data from eligible ICC patients collected from the Surveillance, Epidemiology, and End Results (SEER) database between 2004 and 2021. Feature selection was performed using three methods, including least absolute shrinkage and selection operator (LASSO) regression, the Boruta algorithm, and recursive feature elimination (RFE). Eight machine learning (ML) algorithms were used to develop predictive models. Model performance was evaluated and compared using metrics such as the area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (AUPRC), decision curve analysis (DCA), and calibration curves. The SHapley Additive exPlanations (SHAP) method was applied to rank feature importance and interpret the final model.

RESULT

This study included 8536 ICC patients, including 2816 (33%) with DM. The intersection results of the three feature selection methods identified 10 predictive factors. Among the 8 ML models, the gradient boosting machine (GBM) model achieved the highest AUC (0.802), AUPRC (0.571), and accuracy (0.713), as well as the lowest Brier score (0.177), indicating a comparatively robust overall performance. Calibration curves and DCA indicated that the GBM model has good clinical decision-making capability and predictive performance. SHAP analysis identified the top 10 most relevant features, ranked by relative importance: surgery, N stage, tumor grade, T stage, tumor size, radiotherapy, tumor number, age at diagnosis, chemotherapy, and number of resected lymph nodes (LNs). Additionally, a web-based online calculator was developed to predict the risk of DM in ICC patients, available at https://bijinzhe.shinyapps.io/icc_dm_shiny/ .

CONCLUSION

The GBM model demonstrated considerable potential in predicting the risk of DM in ICC patients. This could assist clinicians in formulating personalized treatment strategies, ultimately improving the overall prognosis of ICC patients.

摘要

背景

肝内胆管癌(ICC)的远处转移(DM)与预后不良及显著高死亡率相关。因此,开发一种有效的DM风险早期预测方法对于制定个性化治疗方案和改善患者预后至关重要。

方法

本研究纳入了2004年至2021年期间从监测、流行病学和最终结果(SEER)数据库收集的符合条件的ICC患者数据。使用三种方法进行特征选择,包括最小绝对收缩和选择算子(LASSO)回归、Boruta算法和递归特征消除(RFE)。使用八种机器学习(ML)算法开发预测模型。使用受试者操作特征曲线下面积(AUC)、精确召回率曲线下面积(AUPRC)、决策曲线分析(DCA)和校准曲线等指标评估和比较模型性能。应用SHapley加性解释(SHAP)方法对特征重要性进行排名并解释最终模型。

结果

本研究纳入8536例ICC患者,其中2816例(33%)发生DM。三种特征选择方法的交叉结果确定了10个预测因素。在8个ML模型中,梯度提升机(GBM)模型的AUC(0.802)、AUPRC(0.571)和准确率(0.713)最高,Brier评分(0.177)最低,表明总体性能相对稳健。校准曲线和DCA表明GBM模型具有良好的临床决策能力和预测性能。SHAP分析确定了前10个最相关特征,按相对重要性排序为:手术、N分期、肿瘤分级、T分期、肿瘤大小、放疗、肿瘤数量、诊断时年龄、化疗和切除淋巴结(LN)数量。此外,还开发了一个基于网络的在线计算器,用于预测ICC患者的DM风险,可在https://bijinzhe.shinyapps.io/icc_dm_shiny/获取。

结论

GBM模型在预测ICC患者DM风险方面显示出巨大潜力。这可以帮助临床医生制定个性化治疗策略,最终改善ICC患者的总体预后。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4c15/12177117/d7583970271a/12672_2025_2952_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验