一种使用XGBoost机器学习和SHAP解释法预测癌症合并急性肺栓塞患者院内死亡的模型

A predictive model for hospital death in cancer patients with acute pulmonary embolism using XGBoost machine learning and SHAP interpretation.

作者信息

Yuan Zhen-Nan, Xue Yu-Juan, Wang Hai-Jun, Qu Shi-Ning, Huang Chu-Lin, Wang Hao, Zhang Hao, Zhang Min-Ze, Xing Xue-Zhong

机构信息

Department of Intensive Care Unit, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, 100021, China.

Department of Pediatrics, Peking University People's Hospital, Peking University, Beijing, China.

出版信息

Sci Rep. 2025 May 25;15(1):18268. doi: 10.1038/s41598-025-02072-1.

DOI:10.1038/s41598-025-02072-1

PMID:40414906

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12104392/

Abstract

The prediction of in-hospital mortality in cancer patients with acute pulmonary embolism (APE) remains a significant clinical challenge. This study aimed to develop and validate a machine learning model using XGBoost to predict in-hospital mortality in this vulnerable population. A retrospective cohort study was conducted using the MIMIC-IV 2.2 database and external data from the intensive care unit of Cancer hospital, Chinese Academy of Medical Sciences, collected between May 1, 2021, and April 30, 2023. A total of 448 cancer patients with APE were included from the MIMIC-IV 2.2 database, divided into a training set (70%, n = 314) and an internal validation set (30%, n = 134). An external validation cohort consisted of 56 patients. An XGBoost model was trained and the SHAP (SHapley Additive Explanations) method was used to identify the top 10 predictors of in-hospital mortality. These predictors included Glasgow Coma Scale (GCS) score, albumin, platelet count, age, serum creatinine, hemoglobin, presence of metastasis, lactate, creatine kinase (CK), and types of cancer. The XGBoost model achieved an area under the ROC curve (AUC) of 0.806 (95% CI: 0.717-0.896) in the internal validation set and 0.724 (95% CI: 0.686-0.901) in the external validation set. Calibration curves indicated good model fit, and decision curve analysis (DCA) demonstrated a high clinical benefit across both the internal and external validation cohorts. The XGBoost model, leveraging SHAP for interpretation, effectively predicts in-hospital mortality in cancer patients with APE. This model provides valuable insights for clinical decision-making and has the potential to improve patient outcomes through early intervention and personalized treatment strategies. Further validation in diverse clinical settings is warranted to confirm its generalizability.

摘要

预测癌症合并急性肺栓塞（APE）患者的院内死亡率仍然是一项重大的临床挑战。本研究旨在开发并验证一种使用XGBoost的机器学习模型，以预测这一脆弱人群的院内死亡率。我们进行了一项回顾性队列研究，使用了MIMIC-IV 2.2数据库以及中国医学科学院肿瘤医院重症监护室2021年5月1日至2023年4月30日期间收集的外部数据。从MIMIC-IV 2.2数据库中纳入了448例癌症合并APE患者，分为训练集（70%，n = 314）和内部验证集（30%，n = 134）。一个外部验证队列由56例患者组成。训练了一个XGBoost模型，并使用SHAP（SHapley加性解释）方法来确定院内死亡率的前10个预测因素。这些预测因素包括格拉斯哥昏迷量表（GCS）评分、白蛋白、血小板计数、年龄、血清肌酐、血红蛋白、转移情况、乳酸、肌酸激酶（CK）以及癌症类型。XGBoost模型在内部验证集中的ROC曲线下面积（AUC）为0.806（95%CI：0.717 - 0.896），在外部验证集中为0.724（95%CI：0.686 - 0.901）。校准曲线表明模型拟合良好，决策曲线分析（DCA）显示在内部和外部验证队列中均具有较高的临床获益。利用SHAP进行解释的XGBoost模型能够有效预测癌症合并APE患者的院内死亡率。该模型为临床决策提供了有价值的见解，并有可能通过早期干预和个性化治疗策略改善患者预后。有必要在不同临床环境中进行进一步验证以确认其可推广性。