使用生成式预训练变换器从临床记录中进行癫痫发作结果的零样本提取。

Zero-Shot Extraction of Seizure Outcomes from Clinical Notes Using Generative Pretrained Transformers.

作者信息

Ojemann William K S, Xie Kevin, Liu Kevin, Chang Ellie, Roth Dan, Litt Brian, Ellis Colin A

机构信息

Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104 USA.

Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104 USA.

出版信息

J Healthc Inform Res. 2025 Apr 29;9(3):380-400. doi: 10.1007/s41666-025-00198-5. eCollection 2025 Sep.

DOI:10.1007/s41666-025-00198-5

PMID:40726746

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12290146/

Abstract

UNLABELLED

Emerging evidence has shown that pre-trained encoder transformer models can extract information from unstructured clinic note text but require manual annotation for supervised fine-tuning. Large, Generative Pre-trained Transformer (GPT) models may streamline this process. In this study, we explore GPTs in zero- and few-shot learning scenarios to analyze clinical health records. We prompt-engineered Llama2 13B to optimize performance in extracting seizure freedom from epilepsy clinic notes and compared it against zero-shot and fine-tuned Bio + ClinicalBERT (BERT) models. Our evaluation encompasses different prompting paradigms, including one-word answers, elaboration-based responses, prompts with date formatting instructions, and prompts with dates in context. We found promising median accuracy rates in seizure freedom classification for zero-shot GPTs: one-word-62%, elaboration-50%, prompts with formatted dates-62%, and prompts with dates in context-74%. These outperform the zero-shot BERT model (25%) but fall short of the fully fine-tuned BERT model (84%). Furthermore, in sparse contexts, such as notes from general neurologists, the best performing GPT (76%) surpasses the fine-tuned BERT model (67%) in extracting seizure freedom. This study demonstrates the potential of GPTs in extracting clinically relevant information from unstructured EHR text, offering insights into population trends in seizure management, drug effects, risk factors, and healthcare disparities. Moreover, GPTs exhibit superiority over task-specific models in contexts with the potential to include less precise descriptions of epilepsy and seizures, highlighting their versatility. Additionally, simple prompt engineering techniques enhance model accuracy, presenting a framework for leveraging EHR data with zero clinical annotation.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1007/s41666-025-00198-5.

摘要

未标注

新出现的证据表明，预训练的编码器变压器模型可以从未结构化的临床记录文本中提取信息，但需要人工标注进行监督微调。大型生成式预训练变压器（GPT）模型可能会简化这一过程。在本研究中，我们探索了GPT在零样本和少样本学习场景中分析临床健康记录的情况。我们对Llama2 13B进行了提示工程优化，以提高从癫痫临床记录中提取无癫痫发作信息的性能，并将其与零样本和微调后的Bio+ClinicalBERT（BERT）模型进行比较。我们的评估涵盖了不同的提示范式，包括单字答案、基于阐述的回答、带有日期格式说明的提示以及上下文中带有日期的提示。我们发现零样本GPT在无癫痫发作分类中的中位准确率很有前景：单字回答为62%，阐述为50%，带有格式化日期的提示为62%，上下文中带有日期的提示为74%。这些结果优于零样本BERT模型（25%），但低于完全微调后的BERT模型（84%）。此外，在稀疏的上下文中，如普通神经科医生的记录中，表现最佳的GPT（76%）在提取无癫痫发作信息方面超过了微调后的BERT模型（67%）。这项研究证明了GPT从未结构化电子健康记录文本中提取临床相关信息的潜力，为癫痫管理、药物效果、风险因素和医疗保健差异方面的人群趋势提供了见解。此外，在可能包含对癫痫和发作不太精确描述的上下文中，GPT比特定任务模型表现更优，凸显了它们的通用性。此外，简单的提示工程技术提高了模型的准确性，为在零临床标注的情况下利用电子健康记录数据提供了一个框架。