Shi Boqun, Chen Liangguo, Pang Shuo, Wang Yue, Wang Shen, Li Fadong, Zhao Wenxin, Guo Pengrong, Zhang Leli, Fan Chu, Zou Yi, Wu Xiaofan
Department of Cardiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China.
J Med Internet Res. 2025 May 12;27:e67253. doi: 10.2196/67253.
Accurate mortality risk prediction is crucial for effective cardiovascular risk management. Recent advancements in artificial intelligence (AI) have demonstrated potential in this specific medical field. Qwen-2 and Llama-3 are high-performance, open-source large language models (LLMs) available online. An artificial neural network (ANN) algorithm derived from the SWEDEHEART (Swedish Web System for Enhancement and Development of Evidence-Based Care in Heart Disease Evaluated According to Recommended Therapies) registry, termed SWEDEHEART-AI, can predict patient prognosis following acute myocardial infarction (AMI).
This study aims to evaluate the 3 models mentioned above in predicting 1-year all-cause mortality in critically ill patients with AMI.
The Medical Information Mart for Intensive Care IV (MIMIC-IV) database is a publicly available data set in critical care medicine. We included 2758 patients who were first admitted for AMI and discharged alive. SWEDEHEART-AI calculated the mortality rate based on each patient's 21 clinical variables. Qwen-2 and Llama-3 analyzed the content of patients' discharge records and directly provided a 1-decimal value between 0 and 1 to represent 1-year death risk probabilities. The patients' actual mortality was verified using follow-up data. The predictive performance of the 3 models was assessed and compared using the Harrell C-statistic (C-index), the area under the receiver operating characteristic curve (AUROC), calibration plots, Kaplan-Meier curves, and decision curve analysis.
SWEDEHEART-AI demonstrated strong discrimination in predicting 1-year all-cause mortality in patients with AMI, with a higher C-index than Qwen-2 and Llama-3 (C-index 0.72, 95% CI 0.69-0.74 vs C-index 0.65, 0.62-0.67 vs C-index 0.56, 95% CI 0.53-0.58, respectively; all P<.001 for both comparisons). SWEDEHEART-AI also showed high and consistent AUROC in the time-dependent ROC curve. The death rates calculated by SWEDEHEART-AI were positively correlated with actual mortality, and the 3 risk classes derived from this model showed clear differentiation in the Kaplan-Meier curve (P<.001). Calibration plots indicated that SWEDEHEART-AI tended to overestimate mortality risk, with an observed-to-expected ratio of 0.478. Compared with the LLMs, SWEDEHEART-AI demonstrated positive and greater net benefits at risk thresholds below 19%.
SWEDEHEART-AI, a trained ANN model, demonstrated the best performance, with strong discrimination and clinical utility in predicting 1-year all-cause mortality in patients with AMI from an intensive care cohort. Among the LLMs, Qwen-2 outperformed Llama-3 and showed moderate predictive value. Qwen-2 and SWEDEHEART-AI exhibited comparable classification effectiveness. The future integration of LLMs into clinical decision support systems holds promise for accurate risk stratification in patients with AMI; however, further research is needed to optimize LLM performance and address calibration issues across diverse patient populations.
准确的死亡风险预测对于有效的心血管风险管理至关重要。人工智能(AI)的最新进展已在这一特定医学领域展现出潜力。Qwen-2和Llama-3是在线可用的高性能、开源大语言模型(LLMs)。一种源自瑞典心脏病循证护理增强与发展网络系统(SWEDEHEART,根据推荐疗法评估)登记处的人工神经网络(ANN)算法,称为SWEDEHEART-AI,可预测急性心肌梗死(AMI)患者的预后。
本研究旨在评估上述三种模型对重症AMI患者1年全因死亡率的预测能力。
重症监护医学信息集市IV(MIMIC-IV)数据库是重症监护医学领域的一个公开可用数据集。我们纳入了2758例首次因AMI入院且存活出院的患者。SWEDEHEART-AI根据每位患者的21项临床变量计算死亡率。Qwen-2和Llama-3分析患者出院记录的内容,并直接提供一个介于0和1之间的一位小数值,以表示1年死亡风险概率。通过随访数据核实患者的实际死亡率。使用Harrell C统计量(C指数)、受试者操作特征曲线下面积(AUROC)、校准图、Kaplan-Meier曲线和决策曲线分析评估并比较这三种模型的预测性能。
SWEDEHEART-AI在预测AMI患者1年全因死亡率方面表现出很强的区分能力,其C指数高于Qwen-2和Llama-3(C指数分别为0.72,95%CI 0.69 - 0.74;C指数0.65,0.62 - 0.67;C指数0.56,95%CI 0.53 - 0.58;两组比较P均<0.001)。SWEDEHEART-AI在时间依赖性ROC曲线中也显示出较高且一致的AUROC。SWEDEHEART-AI计算的死亡率与实际死亡率呈正相关,该模型得出的三个风险类别在Kaplan-Meier曲线中显示出明显差异(P<0.001)。校准图表明SWEDEHEART-AI倾向于高估死亡风险,观察到的与预期的比率为0.478。与大语言模型相比,在风险阈值低于19%时,SWEDEHEART-AI显示出更大的净效益。
经过训练的ANN模型SWEDEHEART-AI表现出最佳性能,在预测重症监护队列中AMI患者1年全因死亡率方面具有很强的区分能力和临床实用性。在大语言模型中,Qwen-2的表现优于Llama-3,显示出中等预测价值。Qwen-2和SWEDEHEART-AI表现出相当的分类有效性。将大语言模型未来整合到临床决策支持系统中有望对AMI患者进行准确的风险分层;然而,需要进一步研究以优化大语言模型的性能并解决不同患者群体中的校准问题。