Franco-Moreno Anabel, Madroñal-Cerezo Elena, de Ancos-Aracil Cristina Lucía, Farfán-Sedano Ana Isabel, Muñoz-Rivas Nuria, Bascuñana Morejón-Girón José, Ruiz-Giardín José Manuel, Álvarez-Rodríguez Federico, Prada-Alonso Jesús, Gala-García Yvonne, Casado-Suela Miguel Ángel, Bustamante-Fermosel Ana, Alfaro-Fernández Nuria, Torres-Macho Juan
Department of Internal Medicine, Hospital Universitario Infanta Leonor-Virgen de la Torre, 28031 Madrid, Spain.
Venous Thromboembolism Unit, Hospital Universitario Infanta Leonor-Virgen de la Torre, Gran Via del Este Avenue, 80, 28031 Madrid, Spain.
Medicina (Kaunas). 2024 Dec 27;61(1):18. doi: 10.3390/medicina61010018.
: Venous thromboembolism (VTE) can be the first manifestation of an underlying cancer. This study aimed to develop a predictive model to assess the risk of occult cancer between 30 days and 24 months after a venous thrombotic event using machine learning (ML). : We designed a case-control study nested in a cohort of patients with VTE included in a prospective registry from two Spanish hospitals between 2005 and 2021. Both clinically and ML-driven feature selection were performed to identify predictors for occult cancer. XGBoost, LightGBM, and CatBoost algorithms were used to train different prediction models, which were subsequently validated in a hold-out dataset. : A total of 815 patients with VTE were included (51.5% male and median age of 59). During follow-up, 56 patients (6.9%) were diagnosed with cancer. One hundred and twenty-one variables were explored for the predictive analysis. CatBoost obtained better performance metrics among the ML models analyzed. The final CatBoost model included, among the top 15 variables to predict hidden malignancy, age, gender, systolic blood pressure, heart rate, weight, chronic lung disease, D-dimer, alanine aminotransferase, hemoglobin, serum creatinine, cholesterol, platelets, triglycerides, leukocyte count and previous VTE. The model had an ROC-AUC of 0.86 (95% CI, 0.83-0.87) in the test set. Sensitivity, specificity, and negative and positive predictive values were 62%, 94%, 93% and 75%, respectively. : This is the first risk score developed for identifying patients with VTE who are at increased risk of occult cancer using ML tools, obtaining a remarkably high diagnostic accuracy. This study's limitations include potential information bias from electronic health records and a small cancer sample size. In addition, variability in detection protocols and evolving clinical practices may affect model accuracy. Our score needs external validation.
静脉血栓栓塞症(VTE)可能是潜在癌症的首发表现。本研究旨在开发一种预测模型,使用机器学习(ML)评估静脉血栓形成事件后30天至24个月内隐匿性癌症的风险。
我们设计了一项病例对照研究,该研究嵌套于2005年至2021年间来自两家西班牙医院的前瞻性登记队列中的VTE患者。进行了临床和基于ML的特征选择,以识别隐匿性癌症的预测因素。使用XGBoost、LightGBM和CatBoost算法训练不同的预测模型,随后在一个留出的数据集中进行验证。
共纳入815例VTE患者(男性占51.5%,中位年龄59岁)。随访期间,56例患者(6.9%)被诊断为癌症。对121个变量进行了预测分析。在分析的ML模型中,CatBoost获得了更好的性能指标。最终的CatBoost模型在预测隐匿性恶性肿瘤的前15个变量中包括年龄、性别、收缩压、心率、体重、慢性肺病、D-二聚体、丙氨酸转氨酶、血红蛋白、血清肌酐、胆固醇、血小板、甘油三酯、白细胞计数和既往VTE。该模型在测试集中的ROC-AUC为0.86(95%CI,0.83-0.87)。敏感性、特异性、阴性和阳性预测值分别为62%、94%、93%和75%。
这是首个使用ML工具开发的用于识别VTE后隐匿性癌症风险增加患者的风险评分,具有非常高的诊断准确性。本研究的局限性包括电子健康记录可能存在的信息偏差以及癌症样本量较小。此外,检测方案的变异性和不断发展的临床实践可能会影响模型准确性。我们的评分需要外部验证。