Division of Cardiac Surgery, Bristol Heart Institute, Translational Health Sciences, University of Bristol, Bristol, UK.
Alan Turing Institute, London, UK.
Eur J Cardiothorac Surg. 2023 Jun 1;63(6). doi: 10.1093/ejcts/ezad183.
To perform a systematic comparison of in-hospital mortality risk prediction post-cardiac surgery, between the predominant scoring system-European System for Cardiac Operative Risk Evaluation (EuroSCORE) II, logistic regression (LR) retrained on the same variables and alternative machine learning techniques (ML)-random forest (RF), neural networks (NN), XGBoost and weighted support vector machine.
Retrospective analyses of prospectively routinely collected data on adult patients undergoing cardiac surgery in the UK from January 2012 to March 2019. Data were temporally split 70:30 into training and validation subsets. Mortality prediction models were created using the 18 variables of EuroSCORE II. Comparisons of discrimination, calibration and clinical utility were then conducted. Changes in model performance, variable-importance over time and hospital/operation-based model performance were also reviewed.
Of the 227 087 adults who underwent cardiac surgery during the study period, there were 6258 deaths (2.76%). In the testing cohort, there was an improvement in discrimination [XGBoost (95% confidence interval (CI) area under the receiver operator curve (AUC), 0.834-0.834, F1 score, 0.276-0.280) and RF (95% CI AUC, 0.833-0.834, F1, 0.277-0.281)] compared with EuroSCORE II (95% CI AUC, 0.817-0.818, F1, 0.243-0.245). There was no significant improvement in calibration with ML and retrained-LR compared to EuroSCORE II. However, EuroSCORE II overestimated risk across all deciles of risk and over time. The calibration drift was lowest in NN, XGBoost and RF compared with EuroSCORE II. Decision curve analysis showed XGBoost and RF to have greater net benefit than EuroSCORE II.
ML techniques showed some statistical improvements over retrained-LR and EuroSCORE II. The clinical impact of this improvement is modest at present. However the incorporation of additional risk factors in future studies may improve upon these findings and warrants further study.
系统比较心脏手术后院内死亡率风险预测,比较主要评分系统——欧洲心脏手术风险评估系统(EuroSCORE)II、基于相同变量重新训练的逻辑回归(LR)和替代机器学习技术(ML)——随机森林(RF)、神经网络(NN)、XGBoost 和加权支持向量机。
回顾性分析 2012 年 1 月至 2019 年 3 月英国接受心脏手术的成年患者的前瞻性常规收集数据。数据按 70:30 的比例分为训练集和验证集。使用 EuroSCORE II 的 18 个变量创建死亡率预测模型。然后进行区分度、校准度和临床实用性的比较。还回顾了模型性能、变量重要性随时间的变化以及基于医院/手术的模型性能的变化。
在研究期间,227087 名成年人接受了心脏手术,其中有 6258 人死亡(2.76%)。在测试队列中,与 EuroSCORE II 相比,区分度有所提高[XGBoost(95%置信区间(CI)下的接收者操作特征曲线(AUC)面积,0.834-0.834,F1 评分,0.276-0.280)和 RF(95% CI AUC,0.833-0.834,F1,0.277-0.281)]。与 EuroSCORE II 相比,ML 和重新训练的 LR 在校准方面没有显著提高。然而,与 ML 和重新训练的 LR 相比,EuroSCORE II 在所有风险十分位数和随时间推移的风险都存在高估。与 EuroSCORE II 相比,NN、XGBoost 和 RF 的校准漂移最低。决策曲线分析显示,与 EuroSCORE II 相比,XGBoost 和 RF 具有更大的净收益。
ML 技术在重新训练的 LR 和 EuroSCORE II 上显示出一些统计上的改进。目前,这种改进的临床影响是适度的。然而,在未来的研究中纳入更多的风险因素可能会改善这些发现,并值得进一步研究。