Wang Jiuyi, Kang Qingxia, Tian Shiqi, Zhang Shunli, Wang Kai, Feng Guibo
Department of General Medicine, The Affiliated Yongchuan Hospital of Chongqing Medical University, Chongqing 402160, China.
Department of Cardiology, The Affiliated Yongchuan Hospital of Chongqing Medical University, Chongqing 402160, China.
Bioengineering (Basel). 2025 May 12;12(5):511. doi: 10.3390/bioengineering12050511.
Heart failure (HF) ranks among the foremost causes of mortality globally, exhibiting particularly high prevalence and significant impact within intensive care units (ICUs). This study sought to develop, validate, and deploy a time-dependent machine learning model aimed at predicting the one-year all-cause mortality risk in ICU patients diagnosed with HF, thereby facilitating precise prognostic evaluation and risk stratification. This study encompassed a cohort of 8960 ICU patients with HF sourced from the Medical Information Mart for Intensive Care IV (MIMIC-IV) database (version 3.1). This latest version of the database added data from 2020 to 2022 on the basis of version 2.2 (covering data from 2008 to 2019); therefore, data spanning 2008 to 2019 ( = 5748) were designated for the training set, while data from 2020 to 2022 ( = 3212) were reserved for the test set. The primary endpoint of interest was one-year all-cause mortality. Least Absolute Shrinkage and Selection Operator (LASSO) regression was employed to select predictive features from an initial pool of 64 candidate variables (including demographic characteristics, vital signs, comorbidities and complications, therapeutic interventions, routine laboratory data, and disease severity scores). Four predictive models were developed and compared: Cox proportional hazards, random survival forest (RSF), Cox proportional hazards deep neural network (DeepSurv), and eXtreme Gradient Boosting (XGBoost). Model performance was assessed using the concordance index (C-index) and Brier score, with model interpretability addressed through SHapley Additive exPlanations (SHAP) and time-dependent Survival SHapley Additive exPlanations (SurvSHAP(t)). This study revealed a one-year mortality rate of 46.1% within the population under investigation. In the training set, LASSO effectively identified 24 features in the model. In the test set, the XGBoost model exhibited superior predictive performance, as evidenced by a C-index of 0.772 and a Brier score of 0.161, outperforming the Cox model (C-index: 0.740, Brier score: 0.175), the RSF model (C-index: 0.747, Brier score: 0.178), and the DeepSur model (C-index: 0.723, Brier score: 0.183). Decision curve analysis validated the clinical utility of the XGBoost model across a broad spectrum of risk thresholds. Feature importance analysis identified the red cell distribution width-to-albumin ratio (RAR), Charlson Comorbidity Index, Simplified Acute Physiology Score II (SAPS II), Acute Physiology Score III (APS III), and the age-bilirubin-INR-creatinine (ABIC) score as the top five predictive factors. Consequently, an online risk prediction tool based on this model has been developed and is publicly accessible. The time-dependent XGBoost model demonstrated robust predictive capability in evaluating the one-year all-cause mortality risk in critically ill HF patients. This model offered a useful tool for early risk identification and supported timely interventions.
心力衰竭(HF)是全球首要的死亡原因之一,在重症监护病房(ICU)中患病率尤其高且影响重大。本研究旨在开发、验证并应用一种时间依赖性机器学习模型,以预测诊断为HF的ICU患者的一年全因死亡风险,从而促进精确的预后评估和风险分层。本研究纳入了来自重症监护医学信息集市IV(MIMIC-IV)数据库(版本3.1)的8960例患有HF的ICU患者队列。该数据库的最新版本在版本2.2(涵盖2008年至2019年的数据)的基础上增加了2020年至2022年的数据;因此,将2008年至2019年的数据( = 5748)指定为训练集,而将2020年至2022年的数据( = 3212)留作测试集。感兴趣的主要终点是一年全因死亡率。采用最小绝对收缩和选择算子(LASSO)回归从64个候选变量的初始集合(包括人口统计学特征、生命体征、合并症和并发症、治疗干预、常规实验室数据以及疾病严重程度评分)中选择预测特征。开发并比较了四种预测模型:Cox比例风险模型、随机生存森林(RSF)模型、Cox比例风险深度神经网络(DeepSurv)模型和极端梯度提升(XGBoost)模型。使用一致性指数(C指数)和Brier评分评估模型性能,并通过SHapley加性解释(SHAP)和时间依赖性生存SHapley加性解释(SurvSHAP(t))解决模型可解释性问题。本研究显示,在被调查人群中一年死亡率为46.1%。在训练集中,LASSO有效地在模型中识别出24个特征。在测试集中,XGBoost模型表现出卓越的预测性能,C指数为0.772,Brier评分为0.161,优于Cox模型(C指数:0.740,Brier评分:0.175)、RSF模型(C指数:0.747,Brier评分:0.178)和DeepSurv模型(C指数:0.723,Brier评分:0.183)。决策曲线分析验证了XGBoost模型在广泛风险阈值范围内的临床实用性。特征重要性分析确定红细胞分布宽度与白蛋白比值(RAR)、Charlson合并症指数、简化急性生理学评分II(SAPS II)、急性生理学评分III(APS III)以及年龄 - 胆红素 - INR - 肌酐(ABIC)评分是前五个预测因素。因此,基于该模型的在线风险预测工具已开发出来并可公开获取。时间依赖性XGBoost模型在评估重症HF患者的一年全因死亡风险方面表现出强大的预测能力。该模型为早期风险识别提供了有用工具,并支持及时干预。