用于心脏手术风险预测的机器学习模型中的性能漂移：回顾性分析

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

作者信息

Dong Tim, Sinha Shubhra, Zhai Ben, Fudulu Daniel, Chan Jeremy, Narayan Pradeep, Judge Andy, Caputo Massimo, Dimagli Arnaldo, Benedetto Umberto, Angelini Gianni D

机构信息

Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, United Kingdom.

School of Computing Science, Northumbria University, Newcastle upon Tyne, United Kingdom.

出版信息

JMIRx Med. 2024 Jun 12;5:e45973. doi: 10.2196/45973.

DOI:10.2196/45973

PMID:38889069

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11217160/

Abstract

BACKGROUND

The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed.

OBJECTIVE

In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift.

METHODS

We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric.

RESULTS

A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P<.0001). Extreme gradient boosting (clinical effectiveness metric [CEM] 0.728, 95% CI 0.728-0.729) and random forest (CEM 0.727, 95% CI 0.727-0.728) were the overall best-performing models, both temporally and nontemporally. EuroSCORE II performed the worst across all comparisons. Sharp changes in variable importance and data set drift from October to December 2017, from June to July 2018, and from December 2018 to February 2019 mirrored the effects of performance decrease across models.

CONCLUSIONS

All models show a decrease in at least 3 of the 5 individual metrics. CEM and variable importance drift detection demonstrate the limitation of logistic regression methods used for cardiac surgery risk prediction and the effects of data set drift. Future work will be required to determine the interplay between ML models and whether ensemble models could improve on their respective performance advantages.

摘要

背景

胸外科医师协会（Society of Thoracic Surgeons）和欧洲心脏手术风险评估系统（EuroSCORE）II风险评分是成人心脏手术后院内死亡率最常用的风险预测模型。然而，随着时间的推移，它们容易出现校准错误，并且在不同数据集之间的泛化能力较差；因此，其应用仍存在争议。尽管人们对此的兴趣日益增加，但了解数据集漂移对机器学习（ML）性能随时间的影响方面的差距，仍然是其在临床实践中更广泛应用的障碍。当ML系统由于其开发所依据的数据与所部署的数据不匹配而表现不佳时，就会发生数据集漂移。

目的

在本研究中，我们使用基于大型英国心脏手术数据库构建的模型分析了性能漂移的程度。目标是：（1）对心脏手术风险ML模型随时间的性能漂移程度进行排名和评估；（2）研究数据集漂移和变量重要性漂移对性能漂移的任何潜在影响。

方法

我们对2012年至2019年期间在英国接受心脏手术的成年患者的前瞻性常规收集数据进行了回顾性分析。我们将数据按70:30的时间比例划分为训练集和验证集以及保留集。开发并评估了五个新型ML死亡率预测模型以及EuroSCORE II，以研究变量重要性漂移、性能漂移和实际数据集漂移之间以及内部的关系。使用共识指标评估性能。

结果

在研究期间，共有227,087名成年人接受了心脏手术，死亡率为2.76%（n = 6258）。有强有力的证据表明所有模型的整体性能均有所下降（P <.0001）。极端梯度提升（临床有效性指标[CEM] 0.728，95% CI 0.728 - 0.729）和随机森林（CEM 0.727，95% CI 0.727 - 0.728）是总体上在时间和非时间方面表现最佳的模型。在所有比较中，EuroSCORE II的表现最差。2017年10月至12月、2018年6月至7月以及2018年12月至2019年2月期间，变量重要性和数据集漂移的急剧变化反映了各模型性能下降的影响。

结论

所有模型在5个个体指标中至少有3个出现下降。CEM和变量重要性漂移检测证明了用于心脏手术风险预测的逻辑回归方法的局限性以及数据集漂移的影响。未来需要开展工作来确定ML模型之间的相互作用，以及集成模型是否可以在其各自的性能优势基础上有所改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1904/11217160/41060e69dd9d/xmed-v5-e45973-g001.jpg

相似文献

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

JMIRx Med. 2024 Jun 12;5:e45973. doi: 10.2196/45973.

Comparison of machine learning techniques in prediction of mortality following cardiac surgery: analysis of over 220 000 patients from a large national database.

Eur J Cardiothorac Surg. 2023 Jun 1;63(6). doi: 10.1093/ejcts/ezad183.

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study.

Digit Health. 2023 Jul 20;9:20552076231187605. doi: 10.1177/20552076231187605. eCollection 2023 Jan-Dec.

Development of machine learning models for mortality risk prediction after cardiac surgery.

Cardiovasc Diagn Ther. 2022 Feb;12(1):12-23. doi: 10.21037/cdt-21-648.

Can machine learning improve mortality prediction following cardiac surgery?

Eur J Cardiothorac Surg. 2020 Dec 1;58(6):1130-1136. doi: 10.1093/ejcts/ezaa229.

Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?

Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.

Clinical utility of a deep-learning mortality prediction model for cardiac surgery decision making.

J Thorac Cardiovasc Surg. 2023 Dec;166(6):e567-e578. doi: 10.1016/j.jtcvs.2023.01.022. Epub 2023 Feb 2.

Development and Validation of an Explainable Machine Learning Model for Predicting Myocardial Injury After Noncardiac Surgery in Two Centers in China: Retrospective Study.

JMIR Aging. 2024 Jul 26;7:e54872. doi: 10.2196/54872.

Machine-learning Models Predict 30-Day Mortality, Cardiovascular Complications, and Respiratory Complications After Aseptic Revision Total Joint Arthroplasty.

Clin Orthop Relat Res. 2022 Nov 1;480(11):2137-2145. doi: 10.1097/CORR.0000000000002276. Epub 2022 Jun 20.

Comparing ensemble learning algorithms and severity of illness scoring systems in cardiac intensive care units: a retrospective study.

Einstein (Sao Paulo). 2024 Oct 14;22:eAO0467. doi: 10.31744/einstein_journal/2024AO0467. eCollection 2024.

引用本文的文献

Predicting Major Adverse Cardiovascular Events After Cardiac Surgery Using Combined Clinical, Laboratory, and Echocardiographic Parameters: A Machine Learning Approach.

Medicina (Kaunas). 2025 Jul 23;61(8):1323. doi: 10.3390/medicina61081323.

Machine learning-based hybrid risk estimation system (ERES) in cardiac surgery: Supplementary insights from the ASA score analysis.

PLOS Digit Health. 2025 Jun 23;4(6):e0000889. doi: 10.1371/journal.pdig.0000889. eCollection 2025 Jun.

Use of pulse pressure index for cardiovascular outcomes assessment and development of a coronary heart disease model for the elderly.

BMC Cardiovasc Disord. 2025 Apr 18;25(1):297. doi: 10.1186/s12872-025-04641-8.

Artificial Intelligence in Surgery: A Systematic Review of Use and Validation.

J Clin Med. 2024 Nov 24;13(23):7108. doi: 10.3390/jcm13237108.

Enhancing Cardiovascular Risk Prediction: Development of an Advanced Xgboost Model with Hospital-Level Random Effects.

Bioengineering (Basel). 2024 Oct 18;11(10):1039. doi: 10.3390/bioengineering11101039.

A machine learning algorithm-based risk prediction score for in-hospital/30-day mortality after adult cardiac surgery.

Eur J Cardiothorac Surg. 2024 Oct 1;66(4). doi: 10.1093/ejcts/ezae368.

本文引用的文献

Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study.

Digit Health. 2023 Jul 20;9:20552076231187605. doi: 10.1177/20552076231187605. eCollection 2023 Jan-Dec.

Comparative analysis of machine learning vs. traditional modeling approaches for predicting in-hospital mortality after cardiac surgery: temporal and spatial external validation based on a nationwide cardiac surgery registry.

Eur Heart J Qual Care Clin Outcomes. 2024 Mar 1;10(2):121-131. doi: 10.1093/ehjqcco/qcad028.

Comparison of machine learning techniques in prediction of mortality following cardiac surgery: analysis of over 220 000 patients from a large national database.

Eur J Cardiothorac Surg. 2023 Jun 1;63(6). doi: 10.1093/ejcts/ezad183.

Deep recurrent reinforced learning model to compare the efficacy of targeted local versus national measures on the spread of COVID-19 in the UK.

BMJ Open. 2022 Feb 21;12(2):e048279. doi: 10.1136/bmjopen-2020-048279.

Weekday and outcomes of elective cardiac surgery in the UK: a large retrospective database analysis.

Eur J Cardiothorac Surg. 2022 May 27;61(6):1381-1388. doi: 10.1093/ejcts/ezac038.

Early prediction of clinical deterioration using data-driven machine-learning modeling of electronic health records.

J Thorac Cardiovasc Surg. 2022 Jul;164(1):211-222.e3. doi: 10.1016/j.jtcvs.2021.10.060. Epub 2021 Nov 24.

Using explainable machine learning to characterise data drift and detect emergent health risks for emergency department admissions during COVID-19.

Sci Rep. 2021 Nov 26;11(1):23017. doi: 10.1038/s41598-021-02481-y.

Prediction of operative mortality for patients undergoing cardiac surgical procedures without established risk scores.

J Thorac Cardiovasc Surg. 2023 Apr;165(4):1449-1459.e15. doi: 10.1016/j.jtcvs.2021.09.010. Epub 2021 Sep 14.

Performance Metrics for the Comparative Analysis of Clinical Risk Prediction Models Employing Machine Learning.

Circ Cardiovasc Qual Outcomes. 2021 Oct;14(10):e007526. doi: 10.1161/CIRCOUTCOMES.120.007526. Epub 2021 Oct 4.

The Clinician and Dataset Shift in Artificial Intelligence.

N Engl J Med. 2021 Jul 15;385(3):283-286. doi: 10.1056/NEJMc2104626.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于心脏手术风险预测的机器学习模型中的性能漂移：回顾性分析

Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHODS

RESULTS

CONCLUSIONS

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献