ChatGPT-4从电子健康记录中提取心力衰竭症状和体征

ChatGPT-4 extraction of heart failure symptoms and signs from electronic health records.

作者信息

Workman T Elizabeth, Ahmed Ali, Sheriff Helen M, Raman Venkatesh K, Zhang Sijian, Shao Yijun, Faselis Charles, Fonarow Gregg C, Zeng-Treitler Qing

机构信息

Washington DC VA Medical Center, Washington DC, USA; The George Washington University, Washington DC, USA.

Washington DC VA Medical Center, Washington DC, USA; The George Washington University, Washington DC, USA; Georgetown University, Washington DC, USA.

出版信息

Prog Cardiovasc Dis. 2024 Nov-Dec;87:44-49. doi: 10.1016/j.pcad.2024.10.010. Epub 2024 Oct 21.

DOI:10.1016/j.pcad.2024.10.010

PMID:39442600

Abstract

BACKGROUND

Natural language processing (NLP) can facilitate research utilizing data from electronic health records (EHRs). Large language models can potentially improve NLP applications leveraging EHR notes. The objective of this study was to assess the performance of zero-shot learning using Chat Generative Pre-trained Transformer 4 (ChatGPT-4) for extraction of symptoms and signs, and compare its performance to baseline machine learning and rule-based methods developed using annotated data.

METHODS AND RESULTS

From unstructured clinical notes of the national EHR data of the Veterans healthcare system, we extracted 1999 text snippets containing relevant keywords for heart failure symptoms and signs, which were then annotated by two clinicians. We also created 102 synthetic snippets that were semantically similar to snippets randomly selected from the original 1999 snippets. The authors applied zero-shot learning, using two different forms of prompt engineering in a symptom and sign extraction task with ChatGPT-4, utilizing the synthetic snippets. For comparison, baseline models using machine learning and rule-based methods were trained using the original 1999 annotated text snippets, and then used to classify the 102 synthetic snippets. The best zero-shot learning application achieved 90.6 % precision, 100 % recall, and 95 % F1 score, outperforming the best baseline method, which achieved 54.9 % precision, 82.4 % recall, and 65.5 % F1 score. Prompt style and temperature settings influenced zero-shot learning performance.

CONCLUSIONS

Zero-shot learning utilizing ChatGPT-4 significantly outperformed traditional machine learning and rule-based NLP. Prompt type and temperature settings affected zero-shot learning performance. These findings suggest a more efficient means of symptoms and signs extraction than traditional machine learning and rule-based methods.

摘要

背景

自然语言处理（NLP）有助于利用电子健康记录（EHR）中的数据进行研究。大语言模型可能会改进利用EHR笔记的NLP应用程序。本研究的目的是评估使用Chat生成预训练变换器4（ChatGPT-4）进行零样本学习提取症状和体征的性能，并将其性能与使用注释数据开发的基线机器学习和基于规则的方法进行比较。

方法和结果

从退伍军人医疗系统的国家EHR数据的非结构化临床笔记中，我们提取了1999个包含心力衰竭症状和体征相关关键词的文本片段，然后由两名临床医生进行注释。我们还创建了102个合成片段，这些片段在语义上与从原始1999个片段中随机选择的片段相似。作者在使用ChatGPT-4的症状和体征提取任务中，采用两种不同形式的提示工程应用零样本学习，利用这些合成片段。为了进行比较，使用原始的1999个注释文本片段训练基于机器学习和基于规则的方法的基线模型，然后用于对102个合成片段进行分类。最佳的零样本学习应用程序实现了90.6%的精确率、100%的召回率和95%的F1分数，优于最佳的基线方法，后者实现了54.9%的精确率、82.4%的召回率和65.5%的F1分数。提示风格和温度设置影响零样本学习性能。