基于大语言模型的临床信息提取：器官获取的案例研究

Clinical Information Extraction with Large Language Models: A Case Study on Organ Procurement.

作者信息

Adam Hammaad, Lin Junjing, Lin Jianchang, Keenan Hillary, Wilson Ashia, Ghassemi Marzyeh

机构信息

Massachusetts Institute of Technology, Cambridge, MA, USA.

Takeda Pharmaceuticals, Cambridge, MA, USA.

出版信息

AMIA Annu Symp Proc. 2025 May 22;2024:115-123. eCollection 2024.

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12099322/

Abstract

Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.

摘要

最近的研究表明，大语言模型（LLMs）是从非结构化文本中提取临床信息的强大工具。然而，现有方法在很大程度上忽略了诸如实验室检查和生命体征等数值信息的提取。在本文中，我们展示了一个关于器官获取的案例研究，评估了大语言模型从临床文本中提取数值数据的能力。我们首先描述基于大语言模型的方法，介绍一种用于数值提取的提示策略以及用于对抗幻觉的新颖启发式方法。我们在一组298份人工标注的笔记上验证了我们的方法，证明其具有高准确率、精确率和召回率。然后，我们使用关于14342名潜在器官捐赠者的43719份笔记语料库，强调了我们的方法对下游数据分析的价值。本案例研究是一项正在进行的合作的关键组成部分，该合作旨在使器官获取数据公开可用以进行信息学研究。

相似文献

1

Clinical Information Extraction with Large Language Models: A Case Study on Organ Procurement.基于大语言模型的临床信息提取：器官获取的案例研究

AMIA Annu Symp Proc. 2025 May 22;2024:115-123. eCollection 2024.

2

Large Language Model Applications for Health Information Extraction in Oncology: Scoping Review.用于肿瘤学健康信息提取的大语言模型应用：范围综述

JMIR Cancer. 2025 Mar 28;11:e65984. doi: 10.2196/65984.

3

Scalable information extraction from free text electronic health records using large language models.使用大语言模型从自由文本电子健康记录中进行可扩展的信息提取。

BMC Med Res Methodol. 2025 Jan 28;25(1):23. doi: 10.1186/s12874-025-02470-z.

4

Evaluation of the Performance of a Large Language Model to Extract Signs and Symptoms from Clinical Notes.评估大型语言模型从临床记录中提取体征和症状的性能。

Stud Health Technol Inform. 2025 Apr 8;323:71-75. doi: 10.3233/SHTI250051.

5

Using Large Language Models to Automate Data Extraction From Surgical Pathology Reports: Retrospective Cohort Study.使用大语言模型自动从外科病理报告中提取数据：回顾性队列研究。

JMIR Form Res. 2025 Apr 7;9:e64544. doi: 10.2196/64544.

6

Utilizing large language models for detecting hospital-acquired conditions: an empirical study on pulmonary embolism.利用大语言模型检测医院获得性疾病：关于肺栓塞的实证研究

J Am Med Inform Assoc. 2025 May 1;32(5):876-884. doi: 10.1093/jamia/ocaf048.

7

Evaluating the effectiveness of biomedical fine-tuning for large language models on clinical tasks.评估生物医学微调对大语言模型在临床任务上的有效性。

J Am Med Inform Assoc. 2025 Jun 1;32(6):1015-1024. doi: 10.1093/jamia/ocaf045.

8

Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation.用于从非结构化和半结构化电子健康记录中提取数据的大语言模型：多模型性能评估

BMJ Health Care Inform. 2025 Jan 19;32(1):e101139. doi: 10.1136/bmjhci-2024-101139.

9

Information extraction from medical case reports using OpenAI InstructGPT.使用 OpenAI InstructGPT 从医学病例报告中提取信息。

Comput Methods Programs Biomed. 2024 Oct;255:108326. doi: 10.1016/j.cmpb.2024.108326. Epub 2024 Jul 18.

10

Task definition, annotated dataset, and supervised natural language processing models for symptom extraction from unstructured clinical notes.从非结构化临床记录中提取症状的任务定义、标注数据集和监督自然语言处理模型。

J Biomed Inform. 2020 Feb;102:103354. doi: 10.1016/j.jbi.2019.103354. Epub 2019 Dec 12.

本文引用的文献

1

Six ways large language models are changing healthcare.大语言模型改变医疗保健的六种方式。

Nat Med. 2023 Dec;29(12):2969-2971. doi: 10.1038/s41591-023-02700-1.

2

Approach to machine learning for extraction of real-world data variables from electronic health records.从电子健康记录中提取真实世界数据变量的机器学习方法。

Front Pharmacol. 2023 Sep 15;14:1180962. doi: 10.3389/fphar.2023.1180962. eCollection 2023.

3

Data gaps in electronic health record (EHR) systems: An audit of problem list completeness during the COVID-19 pandemic.电子健康记录 (EHR) 系统中的数据空白：COVID-19 大流行期间问题清单完整性的审核。

Int J Med Inform. 2021 Jun;150:104452. doi: 10.1016/j.ijmedinf.2021.104452. Epub 2021 Apr 1.

4

Clinical information extraction applications: A literature review.临床信息提取应用：文献综述。

J Biomed Inform. 2018 Jan;77:34-49. doi: 10.1016/j.jbi.2017.11.011. Epub 2017 Nov 21.

5

Correlating Lab Test Results in Clinical Notes with Structured Lab Data: A Case Study in HbA1c and Glucose.将临床记录中的实验室检查结果与结构化实验室数据相关联：糖化血红蛋白（HbA1c）和血糖的案例研究

AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:221-228. eCollection 2017.

6

What is the real function of the liver 'function' tests?肝脏“功能”检查的真正作用是什么？

Ulster Med J. 2012 Jan;81(1):30-6.

7

Management of the heartbeating brain-dead organ donor.心跳脑死亡器官捐献者的管理。

Br J Anaesth. 2012 Jan;108 Suppl 1:i96-107. doi: 10.1093/bja/aer351.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验