• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于放射学诊断提取的大语言模型的跨机构评估:提示工程视角

Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.

作者信息

Moassefi Mana, Houshmand Sina, Faghani Shahriar, Chang Peter D, Sun Shawn H, Khosravi Bardia, Triphati Aakash G, Rasool Ghulam, Bhatia Neil K, Folio Les, Andriole Katherine P, Gichoya Judy W, Erickson Bradley J

机构信息

Mayo Clinic Artificial Intelligence Lab, Department of Radiology, Mayo Clinic, 200 1st Street, S.W., Rochester, MN, 55905, USA.

Department of Radiology, University of California San Francisco, San Francisco, CA, USA.

出版信息

J Imaging Inform Med. 2025 May 8. doi: 10.1007/s10278-025-01523-5.

DOI:10.1007/s10278-025-01523-5
PMID:40341981
Abstract

The rapid evolution of large language models (LLMs) offers promising opportunities for radiology report annotation, aiding in determining the presence of specific findings. This study evaluates the effectiveness of a human-optimized prompt in labeling radiology reports across multiple institutions using LLMs. Six distinct institutions collected 500 radiology reports: 100 in each of 5 categories. A standardized Python script was distributed to participating sites, allowing the use of one common locally executed LLM with a standard human-optimized prompt. The script executed the LLM's analysis for each report and compared predictions to reference labels provided by local investigators. Models' performance using accuracy was calculated, and results were aggregated centrally. The human-optimized prompt demonstrated high consistency across sites and pathologies. Preliminary analysis indicates significant agreement between the LLM's outputs and investigator-provided reference across multiple institutions. At one site, eight LLMs were systematically compared, with Llama 3.1 70b achieving the highest performance in accurately identifying the specified findings. Comparable performance with Llama 3.1 70b was observed at two additional centers, demonstrating the model's robust adaptability to variations in report structures and institutional practices. Our findings illustrate the potential of optimized prompt engineering in leveraging LLMs for cross-institutional radiology report labeling. This approach is straightforward while maintaining high accuracy and adaptability. Future work will explore model robustness to diverse report structures and further refine prompts to improve generalizability.

摘要

大语言模型(LLMs)的快速发展为放射学报告注释提供了充满希望的机会,有助于确定特定发现的存在。本研究评估了一种人工优化提示在使用大语言模型对多个机构的放射学报告进行标注时的有效性。六个不同的机构收集了500份放射学报告:分为5个类别,每个类别100份。一个标准化的Python脚本被分发给参与的站点,允许使用一个常见的本地执行的大语言模型和一个标准的人工优化提示。该脚本对每份报告执行大语言模型的分析,并将预测结果与当地研究人员提供的参考标签进行比较。使用准确率计算模型的性能,并在中心汇总结果。人工优化提示在不同站点和不同病理类型之间表现出高度一致性。初步分析表明,在多个机构中,大语言模型的输出与研究人员提供的参考之间存在显著一致性。在一个站点,系统比较了八个大语言模型,其中Llama 3.1 70b在准确识别指定发现方面表现最佳。在另外两个中心也观察到了与Llama 3.1 70b相当的性能,这表明该模型对报告结构和机构实践的变化具有强大的适应性。我们的研究结果说明了优化提示工程在利用大语言模型进行跨机构放射学报告标注方面的潜力。这种方法简单直接,同时保持了高准确性和适应性。未来的工作将探索模型对不同报告结构的稳健性,并进一步优化提示以提高通用性。

相似文献

1
Cross-Institutional Evaluation of Large Language Models for Radiology Diagnosis Extraction: A Prompt-Engineering Perspective.用于放射学诊断提取的大语言模型的跨机构评估:提示工程视角
J Imaging Inform Med. 2025 May 8. doi: 10.1007/s10278-025-01523-5.
2
Leveraging Large Language Models for Precision Monitoring of Chemotherapy-Induced Toxicities: A Pilot Study with Expert Comparisons and Future Directions.利用大语言模型进行化疗诱导毒性的精准监测:一项专家比较及未来方向的试点研究
Cancers (Basel). 2024 Aug 12;16(16):2830. doi: 10.3390/cancers16162830.
3
An Empirical Evaluation of Prompting Strategies for Large Language Models in Zero-Shot Clinical Natural Language Processing: Algorithm Development and Validation Study.零样本临床自然语言处理中大型语言模型提示策略的实证评估:算法开发与验证研究
JMIR Med Inform. 2024 Apr 8;12:e55318. doi: 10.2196/55318.
4
Open-source Large Language Models can Generate Labels from Radiology Reports for Training Convolutional Neural Networks.开源大语言模型可从放射学报告生成标签以训练卷积神经网络。
Acad Radiol. 2025 May;32(5):2402-2410. doi: 10.1016/j.acra.2024.12.028. Epub 2025 Jan 6.
5
Improving the use of LLMs in radiology through prompt engineering: from precision prompts to zero-shot learning.通过提示工程提高放射科对大语言模型的使用:从精准提示到零样本学习。
Rofo. 2024 Nov;196(11):1166-1170. doi: 10.1055/a-2264-5631. Epub 2024 Feb 26.
6
Large language models for error detection in radiology reports: a comparative analysis between closed-source and privacy-compliant open-source models.用于放射学报告错误检测的大语言模型:闭源模型与符合隐私规定的开源模型的对比分析
Eur Radiol. 2025 Feb 20. doi: 10.1007/s00330-025-11438-y.
7
Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework.胸部X光病理学中的自动放射学报告标注:大语言模型框架的开发与评估
JMIR Med Inform. 2025 Mar 28;13:e68618. doi: 10.2196/68618.
8
Structured clinical reasoning prompt enhances LLM's diagnostic capabilities in diagnosis please quiz cases.结构化临床推理提示增强了大语言模型在诊断测验病例中的诊断能力。
Jpn J Radiol. 2025 Apr;43(4):586-592. doi: 10.1007/s11604-024-01712-2. Epub 2024 Dec 3.
9
Comparative Evaluation of Large Language Models for Translating Radiology Reports into Hindi.将放射学报告翻译成印地语的大语言模型的比较评估
Indian J Radiol Imaging. 2024 Sep 4;35(1):88-96. doi: 10.1055/s-0044-1789618. eCollection 2025 Jan.
10
Quantitative Evaluation of Large Language Models to Streamline Radiology Report Impressions: A Multimodal Retrospective Analysis.大语言模型在简化放射科报告印象方面的定量评估:一项多模态回顾性分析。
Radiology. 2024 Mar;310(3):e231593. doi: 10.1148/radiol.231593.

本文引用的文献

1
Prospects for AI clinical summarization to reduce the burden of patient chart review.人工智能临床总结减轻患者病历审查负担的前景。
Front Digit Health. 2024 Nov 7;6:1475092. doi: 10.3389/fdgth.2024.1475092. eCollection 2024.
2
A New Era of Text Mining in Radiology with Privacy-Preserving LLMs.利用隐私保护大语言模型开启放射学文本挖掘的新时代。
Radiol Artif Intell. 2024 Jul;6(4):e240261. doi: 10.1148/ryai.240261.
3
Patient-centered radiology reports with generative artificial intelligence: adding value to radiology reporting.
基于生成式人工智能的以患者为中心的放射科报告:为放射科报告增添价值。
Sci Rep. 2024 Jun 8;14(1):13218. doi: 10.1038/s41598-024-63824-z.
4
Performance of an Open-Source Large Language Model in Extracting Information from Free-Text Radiology Reports.开源大语言模型从自由文本放射学报告中提取信息的性能。
Radiol Artif Intell. 2024 Jul;6(4):e230364. doi: 10.1148/ryai.230364.
5
Large Language Models: A Guide for Radiologists.大语言模型:放射科医师指南。
Korean J Radiol. 2024 Feb;25(2):126-133. doi: 10.3348/kjr.2023.0997.
6
AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical.放射学报告的人工智能辅助摘要:评估GPT3davinci、BARTcnn、LongT5booksum、LEDbooksum、LEDlegal和LEDclinical。
AJNR Am J Neuroradiol. 2024 Feb 7;45(2):244-248. doi: 10.3174/ajnr.A8102.
7
Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications.放射科中的聊天机器人和大型语言模型:临床和研究应用的实用入门指南。
Radiology. 2024 Jan;310(1):e232756. doi: 10.1148/radiol.232756.
8
O structured reporting, where art thou?结构化报告,你在何方?
Eur Radiol. 2024 Jul;34(7):4193-4194. doi: 10.1007/s00330-023-10465-x. Epub 2023 Nov 27.
9
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4(GPT-4)在规范放射学报告方面的性能。
Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.
10
Large language models encode clinical knowledge.大语言模型编码临床知识。
Nature. 2023 Aug;620(7972):172-180. doi: 10.1038/s41586-023-06291-2. Epub 2023 Jul 12.