Adam Hammaad, Lin Junjing, Lin Jianchang, Keenan Hillary, Wilson Ashia, Ghassemi Marzyeh
Massachusetts Institute of Technology, Cambridge, MA, USA.
Takeda Pharmaceuticals, Cambridge, MA, USA.
AMIA Annu Symp Proc. 2025 May 22;2024:115-123. eCollection 2024.
Recent work has demonstrated that large language models (LLMs) are powerful tools for clinical information extraction from unstructured text. However, existing approaches have largely ignored the extraction of numeric information such as laboratory tests and vital signs. In this article, we present a case study on organ procurement that evaluates the ability of LLMs to extract numeric data from clinical text. We first describe our LLM-based approach, introducing a prompting strategy for numeric extraction and novel heuristics to combat hallucination. We validate our approach on a hand-annotated set of 298 notes, demonstrating that it has high accuracy, precision and recall. We then highlight the value of our approach for downstream data analysis using a corpus of 43,719 notes on 14,342 potential organ donors. This case study is a key component of an ongoing collaboration that aims to make data on organ procurement publicly available for informatics research.
最近的研究表明,大语言模型(LLMs)是从非结构化文本中提取临床信息的强大工具。然而,现有方法在很大程度上忽略了诸如实验室检查和生命体征等数值信息的提取。在本文中,我们展示了一个关于器官获取的案例研究,评估了大语言模型从临床文本中提取数值数据的能力。我们首先描述基于大语言模型的方法,介绍一种用于数值提取的提示策略以及用于对抗幻觉的新颖启发式方法。我们在一组298份人工标注的笔记上验证了我们的方法,证明其具有高准确率、精确率和召回率。然后,我们使用关于14342名潜在器官捐赠者的43719份笔记语料库,强调了我们的方法对下游数据分析的价值。本案例研究是一项正在进行的合作的关键组成部分,该合作旨在使器官获取数据公开可用以进行信息学研究。