Zheng Xingyue, Wu Liuyun, Li Lian, Wang Yin, Yin Qinan, Han Lizhu, Wu Xingwei, Bian Yuan
Department of Pharmacy, Personalized Drug Therapy Key Laboratory of Sichuan Province, Sichuan Academy of Medical Sciences and Sichuan Provincial People's Hospital, School of Medicine, University of Electronic Science and Technology of China, Chengdu, China.
Front Pharmacol. 2025 Feb 28;16:1448879. doi: 10.3389/fphar.2025.1448879. eCollection 2025.
This study focuses on the risk of venous thromboembolism (VTE) in patients with gastric or esophageal cancer (GC/EC), investigating the risk factors for VTE in this population. Utilizing machine learning techniques, the research aims to develop an interpretable VTE risk prediction model. The goal is to identify patients with gastric or esophageal cancer who are at high risk of VTE at an early stage in clinical practice, thereby enabling precise anticoagulant prophylaxis and thrombus management.
This study is a real-world investigation aimed at predicting VTE in patients with GC/EC. Data were collected from inpatients diagnosed with GC/EC at Sichuan Provincial People's Hospital between 1 January 2018, and 31 June 2023. Using nine supervised learning algorithms, 576 prediction models were developed based on 56 available variables. Subsequently, a simplified modeling approach was employed using the top 12 feature variables from the best-performing model. The primary metric for assessing the predictive performance of the models was the area under the ROC curve (AUC). Additionally, the training data used to construct the best model in this study were employed to externally validate several existing assessment models, including the Padua, Caprini, Khorana, and COMPASS-CAT scores.
A total of 3,742 cases of GC/EC patients were collected after excluding duplicate visit information. The study included 861 (23.0%) patients, of which 124 (14.4%) developed VTE. The top five models based on AUC for full-variable modeling are as follows: GBoost (0.9646), Logic Regression (0.9443), AdaBoost (0.9382), CatBoost (0.9354), XGBoost (0.8097). For simplified modeling, the models are: Simp-CatBoost (0.8811), Simp-GBoost (0.8771), Simp-Random Forest (0.8736), Simp-AdaBoost (0.8263), Simp-Logistic Regression (0.8090). After evaluating predictive performance and practicality, the Simp-GBoost model was determined as the best model for this study. External validation of the Padua score, Caprini score, Khorana score, and COMPASS-CAT score based on the training set of the Simp-GBoost model yielded AUCs of 0.4367, 0.2900, 0.5000, and 0.3633, respectively.
In this study, we analyzed the risk factors of VTE in GC/EC patients, and constructed a well-performing VTE risk prediction model capable of accurately identifying the extent of VTE risk in patients. Four VTE prediction scoring systems were introduced to externally validate the dataset of this study. The results demonstrated that the VTE risk prediction model established in this study held greater clinical utility for patients with GC/EC. The Simp-GB model can provide intelligent assistance in the early clinical assessment of VTE risk in these patients.
本研究聚焦于胃癌或食管癌(GC/EC)患者的静脉血栓栓塞症(VTE)风险,调查该人群中VTE的风险因素。利用机器学习技术,本研究旨在开发一个可解释的VTE风险预测模型。目标是在临床实践中早期识别出具有高VTE风险的胃癌或食管癌患者,从而实现精准的抗凝预防和血栓管理。
本研究是一项旨在预测GC/EC患者VTE的真实世界调查。数据收集自2018年1月1日至2023年6月31日在四川省人民医院诊断为GC/EC的住院患者。使用九种监督学习算法,基于56个可用变量开发了576个预测模型。随后,采用简化建模方法,使用表现最佳模型中的前12个特征变量。评估模型预测性能的主要指标是ROC曲线下面积(AUC)。此外,本研究中用于构建最佳模型的训练数据被用于外部验证几个现有的评估模型,包括帕多瓦评分、卡普里尼评分、霍拉纳评分和COMPASS-CAT评分。
排除重复就诊信息后,共收集到3742例GC/EC患者。该研究纳入了861例(23.0%)患者,其中124例(14.4%)发生了VTE。基于全变量建模的AUC排名前五的模型如下:GBoost(0.9646)、逻辑回归(0.9443)、AdaBoost(0.9382)、CatBoost(0.9354)、XGBoost(0.8097)。对于简化建模,模型如下:Simp-CatBoost(0.8811)、Simp-GBoost(0.8771)、Simp-随机森林(0.8736)、Simp-AdaBoost(0.8263)、Simp-逻辑回归(0.8090)。在评估预测性能和实用性后,Simp-GBoost模型被确定为本研究的最佳模型。基于Simp-GBoost模型训练集对帕多瓦评分、卡普里尼评分、霍拉纳评分和COMPASS-CAT评分进行外部验证,得到的AUC分别为0.4367、0.2900、0.5000和0.3633。
在本研究中,我们分析了GC/EC患者VTE的风险因素,并构建了一个性能良好的VTE风险预测模型,能够准确识别患者的VTE风险程度。引入了四个VTE预测评分系统对本研究数据集进行外部验证。结果表明,本研究建立的VTE风险预测模型对GC/EC患者具有更大的临床实用性。Simp-GB模型可为这些患者VTE风险的早期临床评估提供智能辅助。