Suppr超能文献

叙事特征还是结构化特征?一项关于大型语言模型识别有心力衰竭风险癌症患者的研究。

Narrative Feature or Structured Feature? A Study of Large Language Models to Identify Cancer Patients at Risk of Heart Failure.

作者信息

Chen Ziyi, Zhang Mengyuan, Ahmed Mustafa Mohammed, Guo Yi, George Thomas J, Bian Jiang, Wu Yonghui

机构信息

Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA.

Division of Cardiovascular Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:242-251. eCollection 2024.

Abstract

Cancer treatments are known to introduce cardiotoxicity, negatively impacting outcomes and survivorship. Identifying cancer patients at risk of heart failure (HF) is critical to improving cancer treatment outcomes and safety. This study examined machine learning (ML) models to identify cancer patients at risk of HF using electronic health records (EHRs), including traditional ML, Time-Aware long short-term memory (T-LSTM), and large language models (LLMs) using novel narrative features derived from the structured medical codes. We identified a cancer cohort of 12,806 patients from the University of Florida Health, diagnosed with lung, breast, and colorectal cancers, among which 1,602 individuals developed HF after cancer. The LLM, GatorTron-3.9B, achieved the best F1 scores, outperforming the traditional support vector machines by 39%, the T-LSTM deep learning model by 7%, and a widely used transformer model, BERT, by 5.6%. The analysis shows that the proposed narrative features remarkably increased feature density and improved performance.

摘要

已知癌症治疗会引发心脏毒性,对治疗结果和患者生存产生负面影响。识别有心力衰竭(HF)风险的癌症患者对于改善癌症治疗结果和安全性至关重要。本研究考察了机器学习(ML)模型,以利用电子健康记录(EHR)识别有HF风险的癌症患者,包括传统ML、时间感知长短期记忆(T-LSTM)以及使用从结构化医学编码派生的新颖叙述特征的大语言模型(LLM)。我们从佛罗里达大学健康中心识别出一个由12,806名患者组成的癌症队列,这些患者被诊断患有肺癌、乳腺癌和结直肠癌,其中1,602人在患癌后出现了HF。大语言模型GatorTron-3.9B取得了最佳F1分数,比传统支持向量机高出39%,比T-LSTM深度学习模型高出7%,比广泛使用的变压器模型BERT高出5.6%。分析表明,所提出的叙述特征显著提高了特征密度并改善了性能。

相似文献

本文引用的文献

7
Cancer statistics, 2023.癌症统计数据,2023 年。
CA Cancer J Clin. 2023 Jan;73(1):17-48. doi: 10.3322/caac.21763.
8
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
10
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验