• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用标准化标准将大语言模型应用于放射检查申请的自动质量评分。

Applying large language model for automated quality scoring of radiology requisitions using a standardized criteria.

作者信息

Büyüktoka Raşit Eren, Surucu Murat, Erekli Derinkaya Pelin Berfin, Adibelli Zehra Hilal, Salbas Ali, Koc Ali Murat, Buyuktoka Asli Dilara, Isler Yalcın, Ugur Mehmet Alperen, Isiklar Elif

机构信息

Department of Radiology, Izmir Foça State Hospital, Izmir, Türkiye.

Bucak Computer and Informatics Faculty, Burdur Mehmet Akif Ersoy University, Burdur, Türkiye.

出版信息

Eur Radiol. 2025 Aug 20. doi: 10.1007/s00330-025-11933-2.

DOI:10.1007/s00330-025-11933-2
PMID:40836020
Abstract

OBJECTIVES

To create and test a locally adapted large language model (LLM) for automated scoring of radiology requisitions based on the reason for exam imaging reporting and data system (RI-RADS), and to evaluate its performance based on reference standards.

MATERIALS AND METHODS

This retrospective, double-center study included 131,683 radiology requisitions from two institutions. A bidirectional encoder representation from a transformer (BERT)-based model was trained using 101,563 requisitions from Center 1 (including 1500 synthetic examples) and externally tested on 18,887 requisitions from Center 2. The model's performance for two different classification strategies was evaluated by the reference standard created by three different radiologists. Model performance was assessed using Cohen's Kappa, accuracy, F1-score, sensitivity, and specificity with 95% confidence intervals.

RESULTS

A total of 18,887 requisitions were evaluated for the external test set. External testing yielded a performance with an F1-score of 0.93 (95% CI: 0.912-0.943); κ = 0.88 (95% CI: 0.871-0.884). Performance was highest in common categories RI-RADS D and X (F1 ≥ 0.96) and lowest for rare categories RI-RADS A and B (F1 ≤ 0.49). When grouped into three categories (adequate, inadequate, and unacceptable), overall model performance improved [F1-score = 0.97; (95% CI: 0.96-0.97)].

CONCLUSION

The locally adapted BERT-based model demonstrated high performance and almost perfect agreement with radiologists in automated RI-RADS scoring, showing promise for integration into radiology workflows to improve requisition completeness and communication.

KEY POINTS

Question Can an LLM accurately and automatically score radiology requisitions based on standardized criteria to address the challenges of incomplete information in radiological practice? Findings A locally adapted BERT-based model demonstrated high performance (F1-score 0.93) and almost perfect agreement with radiologists in automated RI-RADS scoring across a large, multi-institutional dataset. Clinical relevance LLMs offer a scalable solution for automated scoring of radiology requisitions, with the potential to improve workflow in radiology. Further improvement and integration into clinical practice could enhance communication, contributing to better diagnoses and patient care.

摘要

目的

创建并测试一个本地适配的大语言模型(LLM),用于根据检查原因对放射学申请单进行基于影像报告和数据系统(RI-RADS)的自动评分,并根据参考标准评估其性能。

材料与方法

这项回顾性、双中心研究纳入了来自两个机构的131,683份放射学申请单。使用来自中心1的101,563份申请单(包括1500个合成示例)训练了基于变压器双向编码器表征(BERT)的模型,并在来自中心2的18,887份申请单上进行外部测试。通过三位不同放射科医生创建的参考标准评估模型在两种不同分类策略下的性能。使用Cohen's Kappa、准确率、F1分数、敏感度和特异度以及95%置信区间评估模型性能。

结果

对外部测试集共评估了18,887份申请单。外部测试得出的性能为F1分数0.93(95% CI:0.912 - 0.943);κ = 0.88(95% CI:0.871 - 0.884)。在常见类别RI-RADS D和X中性能最高(F1≥0.96),在罕见类别RI-RADS A和B中性能最低(F1≤0.49)。当分为三类(充分、不充分和不可接受)时,整体模型性能有所提高[F1分数 = 0.97;(95% CI:0.96 - 0.97)]。

结论

本地适配的基于BERT的模型在RI-RADS自动评分中表现出高性能,与放射科医生的评分几乎完全一致,显示出有望整合到放射学工作流程中以提高申请单完整性和沟通效果。

关键点

问题 一个大语言模型能否根据标准化标准准确自动地对放射学申请单进行评分,以应对放射学实践中信息不完整的挑战? 发现 一个本地适配的基于BERT的模型在一个大型多机构数据集中的RI-RADS自动评分中表现出高性能(F1分数0.93),与放射科医生的评分几乎完全一致。 临床意义 大语言模型为放射学申请单的自动评分提供了一种可扩展的解决方案,有可能改善放射学工作流程。进一步改进并整合到临床实践中可以加强沟通,有助于更好的诊断和患者护理。

相似文献

1
Applying large language model for automated quality scoring of radiology requisitions using a standardized criteria.使用标准化标准将大语言模型应用于放射检查申请的自动质量评分。
Eur Radiol. 2025 Aug 20. doi: 10.1007/s00330-025-11933-2.
2
An Institutional Large Language Model for Musculoskeletal MRI Improves Protocol Adherence and Accuracy.用于肌肉骨骼磁共振成像的机构大语言模型可提高方案依从性和准确性。
J Bone Joint Surg Am. 2025 Jul 8. doi: 10.2106/JBJS.24.01429.
3
Enhancing Pulmonary Disease Prediction Using Large Language Models With Feature Summarization and Hybrid Retrieval-Augmented Generation: Multicenter Methodological Study Based on Radiology Report.使用具有特征总结和混合检索增强生成功能的大语言模型增强肺部疾病预测:基于放射学报告的多中心方法学研究
J Med Internet Res. 2025 Jun 11;27:e72638. doi: 10.2196/72638.
4
Development of a Large-Scale Dataset of Chest Computed Tomography Reports in Japanese and a High-Performance Finding Classification Model: Dataset Development and Validation Study.日语胸部计算机断层扫描报告大规模数据集的开发及高性能发现分类模型:数据集开发与验证研究
JMIR Med Inform. 2025 Aug 28;13:e71137. doi: 10.2196/71137.
5
Using a Large Language Model for Breast Imaging Reporting and Data System Classification and Malignancy Prediction to Enhance Breast Ultrasound Diagnosis: Retrospective Study.使用大语言模型进行乳腺影像报告和数据系统分类及恶性肿瘤预测以增强乳腺超声诊断:回顾性研究
JMIR Med Inform. 2025 Jun 11;13:e70924. doi: 10.2196/70924.
6
Automated Extraction of Mortality Information From Publicly Available Sources Using Large Language Models: Development and Evaluation Study.使用大语言模型从公开可用来源自动提取死亡率信息:开发与评估研究
J Med Internet Res. 2025 Aug 18;27:e71113. doi: 10.2196/71113.
7
Domain-Specific Pretraining of NorDeClin-Bidirectional Encoder Representations From Transformers for Code Prediction in Norwegian Clinical Texts: Model Development and Evaluation Study.用于挪威临床文本代码预测的基于变压器的挪威语临床双向编码器表示的特定领域预训练:模型开发与评估研究
JMIR AI. 2025 Aug 25;4:e66153. doi: 10.2196/66153.
8
Comparison of a Specialized Large Language Model with GPT-4o for CT and MRI Radiology Report Summarization.一种用于CT和MRI放射学报告总结的专业大语言模型与GPT-4o的比较。
Radiology. 2025 Aug;316(2):e243774. doi: 10.1148/radiol.243774.
9
Large Language Model Symptom Identification From Clinical Text: Multicenter Study.基于临床文本的大语言模型症状识别:多中心研究。
J Med Internet Res. 2025 Jul 31;27:e72984. doi: 10.2196/72984.
10
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

本文引用的文献

1
Assessing Completeness of Clinical Histories Accompanying Imaging Orders Using Adapted Open-Source and Closed-Source Large Language Models.使用适配的开源和闭源大语言模型评估影像检查申请单所附临床病史的完整性
Radiology. 2025 Feb;314(2):e241051. doi: 10.1148/radiol.241051.
2
Leveraging Large Language Models to Generate Clinical Histories for Oncologic Imaging Requisitions.利用大语言模型生成肿瘤影像学检查申请的临床病史。
Radiology. 2025 Feb;314(2):e242134. doi: 10.1148/radiol.242134.
3
A generalist medical language model for disease diagnosis assistance.
用于疾病诊断辅助的通用医学语言模型。
Nat Med. 2025 Mar;31(3):932-942. doi: 10.1038/s41591-024-03416-6. Epub 2025 Jan 8.
4
Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study.利用特定领域的大语言模型对自由文本MRI报告进行LI-RADS v2018分类:一项可行性研究。
Insights Imaging. 2024 Nov 22;15(1):280. doi: 10.1186/s13244-024-01850-1.
5
Assessment of Reason for Exam Imaging Reporting and Data System (RI-RADS) in inpatient diagnostic imaging referrals.住院诊断影像转诊中影像报告和数据系统(RI-RADS)检查原因的评估
Insights Imaging. 2024 Nov 8;15(1):268. doi: 10.1186/s13244-024-01846-x.
6
Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals.医学专业人员医学影像生成式人工智能和大型语言模型更新基础篇。
Korean J Radiol. 2024 Mar;25(3):224-242. doi: 10.3348/kjr.2023.0818.
7
Structured request form in musculoskeletal radiology examinations (CONCERTO): results of an expert Delphi consensus-structured radiology request form for correct classification of patients to undergo radiological examinations of the Italian Society of Medical and Interventional Radiology (SIRM), the Italian Society of Rheumatology (SIR) and the Italian Society of Orthopedics and Traumatology (SIOT).肌肉骨骼放射学检查结构化申请表(CONCERTO):意大利医学与介入放射学会(SIRM)、意大利风湿病学会(SIR)和意大利骨科学与创伤学学会(SIOT)用于对接受放射学检查患者进行正确分类的专家德尔菲共识结构化放射学申请表的结果。
Radiol Med. 2024 Feb;129(2):307-314. doi: 10.1007/s11547-024-01762-6. Epub 2024 Feb 5.
8
Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications.放射科中的聊天机器人和大型语言模型:临床和研究应用的实用入门指南。
Radiology. 2024 Jan;310(1):e232756. doi: 10.1148/radiol.232756.
9
Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions.医学影像学中的大语言模型:基础、应用、伦理考量、风险和未来方向。
Diagn Interv Radiol. 2024 Mar 6;30(2):80-90. doi: 10.4274/dir.2023.232417. Epub 2023 Oct 3.
10
Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial.基于强化学习的 2 型糖尿病优化血糖控制:一项概念验证试验。
Nat Med. 2023 Oct;29(10):2633-2642. doi: 10.1038/s41591-023-02552-9. Epub 2023 Sep 14.