Chen Ziyi, Zhang Mengyuan, Ahmed Mustafa Mohammed, Guo Yi, George Thomas J, Bian Jiang, Wu Yonghui
Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.
Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA.
AMIA Annu Symp Proc. 2025 May 22;2024:242-251. eCollection 2024.
Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.
已知癌症治疗会引发心脏毒性,对治疗结果和患者生存产生负面影响。识别有心力衰竭(HF)风险的癌症患者对于改善癌症治疗结果和安全性至关重要。本研究考察了机器学习(ML)模型,以利用电子健康记录(EHR)识别有HF风险的癌症患者,包括传统ML、时间感知长短期记忆(T-LSTM)以及使用从结构化医学编码派生的新颖叙述特征的大语言模型(LLM)。我们从佛罗里达大学健康中心识别出一个由12,806名患者组成的癌症队列,这些患者被诊断患有肺癌、乳腺癌和结直肠癌,其中1,602人在患癌后出现了HF。大语言模型GatorTron-3.9B取得了最佳F1分数,比传统支持向量机高出39%,比T-LSTM深度学习模型高出7%,比广泛使用的变压器模型BERT高出5.6%。分析表明,所提出的叙述特征显著提高了特征密度并改善了性能。