Suppr超能文献

使用GPT-4o和Llama-3.3-70B从自由文本中风CT报告中提取数据:注释指南的影响

Data extraction from free-text stroke CT reports using GPT-4o and Llama-3.3-70B: the impact of annotation guidelines.

作者信息

Wihl Jonas, Rosenkranz Enrike, Schramm Severin, Berberich Cornelius, Griessmair Michael, Woźnicki Piotr, Pinto Francisco, Ziegelmayer Sebastian, Adams Lisa C, Bressem Keno K, Kirschke Jan S, Zimmer Claus, Wiestler Benedikt, Hedderich Dennis, Kim Su Hwan

机构信息

Department of Diagnostic and Interventional Neuroradiology, TUM University Hospital, School of Medicine and Health, Technical University of Munich, Munich, Germany.

Department of Diagnostic, Interventional and Pediatric Radiology, Inselspital Bern, University of Bern, Bern, Switzerland.

出版信息

Eur Radiol Exp. 2025 Jun 19;9(1):61. doi: 10.1186/s41747-025-00600-2.

Abstract

BACKGROUND

To evaluate the impact of an annotation guideline on the performance of large language models (LLMs) in extracting data from stroke computed tomography (CT) reports.

METHODS

The performance of GPT-4o and Llama-3.3-70B in extracting ten imaging findings from stroke CT reports was assessed in two datasets from a single academic stroke center. Dataset A (n = 200) was a stratified cohort including various pathological findings, whereas dataset B (n = 100) was a consecutive cohort. Initially, an annotation guideline providing clear data extraction instructions was designed based on a review of cases with inter-annotator disagreements in dataset A. For each LLM, data extraction was performed under two conditions: with the annotation guideline included in the prompt and without it.

RESULTS

GPT-4o consistently demonstrated superior performance over Llama-3.3-70B under identical conditions, with micro-averaged precision ranging from 0.83 to 0.95 for GPT-4o and from 0.65 to 0.86 for Llama-3.3-70B. Across both models and both datasets, incorporating the annotation guideline into the LLM input resulted in higher precision rates, while recall rates largely remained stable. In dataset B, the precision of GPT-4o and Llama-3-70B improved from 0.83 to 0.95 and from 0.87 to 0.94, respectively. Overall classification performance with and without the annotation guideline was significantly different in five out of six conditions.

CONCLUSION

GPT-4o and Llama-3.3-70B show promising performance in extracting imaging findings from stroke CT reports, although GPT-4o steadily outperformed Llama-3.3-70B. We also provide evidence that well-defined annotation guidelines can enhance LLM data extraction accuracy.

RELEVANCE STATEMENT

Annotation guidelines can improve the accuracy of LLMs in extracting findings from radiological reports, potentially optimizing data extraction for specific downstream applications.

KEY POINTS

LLMs have utility in data extraction from radiology reports, but the role of annotation guidelines remains underexplored. Data extraction accuracy from stroke CT reports by GPT-4o and Llama-3.3-70B improved when well-defined annotation guidelines were incorporated into the model prompt. Well-defined annotation guidelines can improve the accuracy of LLMs in extracting imaging findings from radiological reports.

摘要

背景

评估注释指南对大语言模型(LLMs)从卒中计算机断层扫描(CT)报告中提取数据性能的影响。

方法

在来自单个学术卒中中心的两个数据集中,评估GPT-4o和Llama-3.3-70B从卒中CT报告中提取十种影像表现的性能。数据集A(n = 200)是一个分层队列,包括各种病理表现,而数据集B(n = 100)是一个连续队列。最初,基于对数据集A中标注者间存在分歧的病例的回顾,设计了一个提供清晰数据提取说明的注释指南。对于每个大语言模型,在两种条件下进行数据提取:提示中包含注释指南和不包含注释指南。

结果

在相同条件下,GPT-4o始终表现出优于Llama-3.3-70B的性能,GPT-4o的微平均精度在0.83至0.95之间,Llama-3.3-70B的微平均精度在0.65至0.86之间。在两个模型和两个数据集中,将注释指南纳入大语言模型输入会导致更高的精确率,而召回率基本保持稳定。在数据集B中,GPT-4o和Llama-3-70B的精确率分别从0.83提高到0.95和从0.87提高到0.94。在六种情况中的五种情况下,有和没有注释指南时的总体分类性能存在显著差异。

结论

GPT-4o和Llama-3.3-70B在从卒中CT报告中提取影像表现方面显示出有前景的性能,尽管GPT-4o始终优于Llama-3.3-70B。我们还提供了证据表明,定义明确的注释指南可以提高大语言模型的数据提取准确性。

相关性声明

注释指南可以提高大语言模型从放射学报告中提取表现的准确性,有可能优化针对特定下游应用的数据提取。

关键点

大语言模型在从放射学报告中提取数据方面具有实用性,但注释指南的作用仍未得到充分探索。当将定义明确的注释指南纳入模型提示时,GPT-4o和Llama-3.3-70B从卒中CT报告中提取数据的准确性得到提高。定义明确的注释指南可以提高大语言模型从放射学报告中提取影像表现的准确性。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验