• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型进行上下文学习:一种改进放射学报告标注的简单有效方法。

In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.

作者信息

Kim Songsoo, Kim Donghyun, Kim Jaewoong, Koo Jalim, Yoon Jinsik, Yoon Dukyong

机构信息

Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul, Korea.

Department of Radiology, Central Draft Physical Examination Office of Military Manpower Administration, Daegu, Korea.

出版信息

Healthc Inform Res. 2025 Jul;31(3):295-309. doi: 10.4258/hir.2025.31.3.295. Epub 2025 Jul 31.

DOI:10.4258/hir.2025.31.3.295
PMID:40840937
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12370419/
Abstract

OBJECTIVES

This study assessed the effectiveness of in-context learning using Generative Pre-trained Transformer-4 (GPT-4) for labeling radiology reports.

METHODS

In this retrospective study, radiology reports were obtained from the Medical Information Mart for Intensive Care III database. Two structured prompts-the "basic prompt" and the "in-context prompt"- were compared. An optimization experiment was conducted to assess consistency and the occurrence of output format errors. The primary labeling experiments were performed on 200 unseen head computed tomography (CT) reports for multilabel classification of predefined labels (Experiment 1) and on 400 unseen abdominal CT reports for multi-label classification of actionable findings (Experiment 2).

RESULTS

The inter-reader accuracies in Experiments 1 and 2 were 0.93 and 0.84, respectively. For multi-label classification of head CT reports (Experiment 1), the in-context prompt led to notable increases in F1-scores for the "foreign body" and "mass" labels (gains of 0.66 and 0.22, respectively). However, improvements for other labels were modest. In multi-label classification of abdominal CT reports (Experiment 2), in-context prompts produced substantial improvements in F1-scores across all labels compared to basic prompts. Providing context equipped the model with domain-specific knowledge and helped align its existing knowledge, thereby improving performance.

CONCLUSIONS

Incontext learning with GPT-4 consistently improved performance in labeling radiology reports. This approach is particularly effective for subjective labeling tasks and allows the model to align its criteria with those of human annotators for objective labeling. This practical strategy offers a simple, adaptable, and researcher-oriented method that can be applied to diverse labeling tasks.

摘要

目的

本研究评估了使用生成式预训练变换器4(GPT-4)进行上下文学习以标注放射学报告的有效性。

方法

在这项回顾性研究中,从重症监护医学信息数据库III获取放射学报告。比较了两个结构化提示——“基本提示”和“上下文提示”。进行了一项优化实验以评估一致性和输出格式错误的发生率。主要标注实验在200份未见过的头部计算机断层扫描(CT)报告上进行,用于对预定义标签进行多标签分类(实验1),并在400份未见过的腹部CT报告上进行,用于对可操作发现进行多标签分类(实验2)。

结果

实验1和实验2中读者间的准确率分别为0.93和0.84。对于头部CT报告的多标签分类(实验1),上下文提示使“异物”和“肿块”标签的F1分数显著提高(分别提高了0.66和0.22)。然而,其他标签的改善幅度较小。在腹部CT报告的多标签分类(实验2)中,与基本提示相比,上下文提示使所有标签的F1分数都有显著提高。提供上下文为模型提供了特定领域的知识,并有助于使其现有知识对齐,从而提高性能。

结论

使用GPT-4进行上下文学习在标注放射学报告方面持续提高了性能。这种方法对于主观标注任务特别有效,并允许模型在客观标注时使其标准与人类注释者的标准对齐。这种实用策略提供了一种简单、可适应且以研究人员为导向的方法,可应用于各种标注任务。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/6e0158afaac8/hir-2025-31-3-295f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/5b22b0ccc299/hir-2025-31-3-295f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/1d3dd72fef80/hir-2025-31-3-295f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/b3f0fcf6aa8e/hir-2025-31-3-295f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/d740fbfc334b/hir-2025-31-3-295f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/6e0158afaac8/hir-2025-31-3-295f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/5b22b0ccc299/hir-2025-31-3-295f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/1d3dd72fef80/hir-2025-31-3-295f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/b3f0fcf6aa8e/hir-2025-31-3-295f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/d740fbfc334b/hir-2025-31-3-295f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5797/12370419/6e0158afaac8/hir-2025-31-3-295f5.jpg

相似文献

1
In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.利用大语言模型进行上下文学习:一种改进放射学报告标注的简单有效方法。
Healthc Inform Res. 2025 Jul;31(3):295-309. doi: 10.4258/hir.2025.31.3.295. Epub 2025 Jul 31.
2
Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型:数据集开发与验证研究
JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.
3
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
4
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
5
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
6
Comparison of a Specialized Large Language Model with GPT-4o for CT and MRI Radiology Report Summarization.一种用于CT和MRI放射学报告总结的专业大语言模型与GPT-4o的比较。
Radiology. 2025 Aug;316(2):e243774. doi: 10.1148/radiol.243774.
7
Detecting Stigmatizing Language in Clinical Notes with Large Language Models for Addiction Care.使用大语言模型在成瘾护理临床记录中检测污名化语言。
medRxiv. 2025 Aug 12:2025.08.08.25333315. doi: 10.1101/2025.08.08.25333315.
8
Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews.大型语言模型在通过标题和摘要筛选确定合格研究方面的人类可比敏感性:使用 GPT-3.5 和 GPT-4 进行系统评价的 3 层策略。
J Med Internet Res. 2024 Aug 16;26:e52758. doi: 10.2196/52758.
9
Language Models for Multilabel Document Classification of Surgical Concepts in Exploratory Laparotomy Operative Notes: Algorithm Development Study.用于探索性剖腹手术记录中手术概念多标签文档分类的语言模型:算法开发研究
JMIR Med Inform. 2025 Jul 9;13:e71176. doi: 10.2196/71176.
10
Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study.西班牙电子健康记录中射血分数保留的心力衰竭症状检测语言模型的多标准优化:比较建模研究
J Med Internet Res. 2025 Jul 17;27:e76433. doi: 10.2196/76433.

本文引用的文献

1
Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports.GPT-4作为头部CT报告校对工具可行性的大规模验证
Radiology. 2025 Jan;314(1):e240701. doi: 10.1148/radiol.240701.
2
Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings.评估 GPT-4 识别和生成针对可操作偶然放射学发现的患者医嘱的能力。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1983-1993. doi: 10.1093/jamia/ocae117.
3
Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination.
GPT-3.5 和 GPT-4 在放射学 Board 式考试中的可靠性、可重复性、稳健性和置信度评估。
Radiology. 2024 May;311(2):e232715. doi: 10.1148/radiol.232715.
4
Using GPT-4 for LI-RADS feature extraction and categorization with multilingual free-text reports.使用 GPT-4 对多语言自由文本报告进行 LI-RADS 特征提取和分类。
Liver Int. 2024 Jul;44(7):1578-1587. doi: 10.1111/liv.15891. Epub 2024 Apr 23.
5
Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.生成式大型语言模型在放射科报告语音识别错误检测中的应用。
Radiol Artif Intell. 2024 Mar;6(2):e230205. doi: 10.1148/ryai.230205.
6
General-Purpose Large Language Models Versus a Domain-Specific Natural Language Processing Tool for Label Extraction From Chest Radiograph Reports.通用大语言模型与用于从胸部X光报告中提取标签的特定领域自然语言处理工具的比较
AJR Am J Roentgenol. 2024 Apr;222(4):e2330573. doi: 10.2214/AJR.23.30573. Epub 2024 Jan 17.
7
Evaluation of ChatGPT and Google Bard Using Prompt Engineering in Cancer Screening Algorithms.利用提示工程评估癌症筛查算法中的 ChatGPT 和 Google Bard。
Acad Radiol. 2024 May;31(5):1799-1804. doi: 10.1016/j.acra.2023.11.002. Epub 2023 Dec 15.
8
Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models.使用大语言模型对产后出血进行零样本可解释表型分析。
NPJ Digit Med. 2023 Nov 30;6(1):212. doi: 10.1038/s41746-023-00957-x.
9
Automatic classification and prioritisation of actionable BI-RADS categories using natural language processing models.使用自然语言处理模型对可操作 BI-RADS 类别进行自动分类和优先级排序。
Clin Radiol. 2024 Jan;79(1):e1-e7. doi: 10.1016/j.crad.2023.09.009. Epub 2023 Sep 27.
10
Feasibility of Using the Privacy-preserving Large Language Model Vicuna for Labeling Radiology Reports.使用隐私保护的大型语言模型 Vicuna 对放射科报告进行标注的可行性研究。
Radiology. 2023 Oct;309(1):e231147. doi: 10.1148/radiol.231147.