• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从修订到洞察:利用生成式人工智能模型将放射学报告修订转化为可操作的教育反馈

From Revisions to Insights: Converting Radiology Report Revisions into Actionable Educational Feedback Using Generative AI Models.

作者信息

Lyo Shawn, Mohan Suyash, Hassankhani Alvand, Noor Abass, Dako Farouk, Cook Tessa

机构信息

Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Imaging Inform Med. 2025 Apr;38(2):1265-1279. doi: 10.1007/s10278-024-01233-4. Epub 2024 Aug 19.

DOI:10.1007/s10278-024-01233-4
PMID:39160366
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11950553/
Abstract

Expert feedback on trainees' preliminary reports is crucial for radiologic training, but real-time feedback can be challenging due to non-contemporaneous, remote reading and increasing imaging volumes. Trainee report revisions contain valuable educational feedback, but synthesizing data from raw revisions is challenging. Generative AI models can potentially analyze these revisions and provide structured, actionable feedback. This study used the OpenAI GPT-4 Turbo API to analyze paired synthesized and open-source analogs of preliminary and finalized reports, identify discrepancies, categorize their severity and type, and suggest review topics. Expert radiologists reviewed the output by grading discrepancies, evaluating the severity and category accuracy, and suggested review topic relevance. The reproducibility of discrepancy detection and maximal discrepancy severity was also examined. The model exhibited high sensitivity, detecting significantly more discrepancies than radiologists (W = 19.0, p < 0.001) with a strong positive correlation (r = 0.778, p < 0.001). Interrater reliability for severity and type were fair (Fleiss' kappa = 0.346 and 0.340, respectively; weighted kappa = 0.622 for severity). The LLM achieved a weighted F1 score of 0.66 for severity and 0.64 for type. Generated teaching points were considered relevant in ~ 85% of cases, and relevance correlated with the maximal discrepancy severity (Spearman ρ = 0.76, p < 0.001). The reproducibility was moderate to good (ICC (2,1) = 0.690) for the number of discrepancies and substantial for maximal discrepancy severity (Fleiss' kappa = 0.718; weighted kappa = 0.94). Generative AI models can effectively identify discrepancies in report revisions and generate relevant educational feedback, offering promise for enhancing radiology training.

摘要

专家对实习医生初步报告的反馈对放射学培训至关重要,但由于非同步、远程阅片以及成像量不断增加,实时反馈可能具有挑战性。实习医生报告的修订包含有价值的教育反馈,但从原始修订中综合数据具有挑战性。生成式人工智能模型有可能分析这些修订并提供结构化的、可操作的反馈。本研究使用OpenAI GPT-4 Turbo API来分析初步报告和最终报告的配对合成模拟物和开源类似物,识别差异,对其严重程度和类型进行分类,并提出审查主题。放射学专家通过对差异进行评分、评估严重程度和类别准确性以及建议审查主题相关性来审查输出结果。还检查了差异检测的可重复性和最大差异严重程度。该模型表现出高灵敏度,检测到的差异明显多于放射科医生(W = 19.0,p < 0.001),且具有很强的正相关性(r = 0.778,p < 0.001)。严重程度和类型的评分者间信度一般(Fleiss' kappa分别为0.346和0.340;严重程度的加权kappa为0.622)。大语言模型在严重程度方面的加权F1得分为0.66,在类型方面为0.64。生成的教学要点在约85%的案例中被认为是相关的,相关性与最大差异严重程度相关(Spearman ρ = 0.76,p < 0.001)。差异数量的可重复性中等至良好(ICC(2,1)=0.690),最大差异严重程度的可重复性较高(Fleiss' kappa = 0.718;加权kappa = 0.94)。生成式人工智能模型可以有效地识别报告修订中的差异并生成相关的教育反馈,为加强放射学培训带来了希望。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/02e1d374943f/10278_2024_1233_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/6ee23e5f8e85/10278_2024_1233_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/ca781f9112a4/10278_2024_1233_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/549777b9b20e/10278_2024_1233_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/6475a9976cb1/10278_2024_1233_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/71122dccc90b/10278_2024_1233_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/eaf56d5f730e/10278_2024_1233_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/3123d6f17c25/10278_2024_1233_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/acefbc1538ab/10278_2024_1233_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/c319097c57d0/10278_2024_1233_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/d7a053d42705/10278_2024_1233_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/9b6c23533ac5/10278_2024_1233_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/02e1d374943f/10278_2024_1233_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/6ee23e5f8e85/10278_2024_1233_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/ca781f9112a4/10278_2024_1233_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/549777b9b20e/10278_2024_1233_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/6475a9976cb1/10278_2024_1233_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/71122dccc90b/10278_2024_1233_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/eaf56d5f730e/10278_2024_1233_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/3123d6f17c25/10278_2024_1233_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/acefbc1538ab/10278_2024_1233_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/c319097c57d0/10278_2024_1233_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/d7a053d42705/10278_2024_1233_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/9b6c23533ac5/10278_2024_1233_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/02e1d374943f/10278_2024_1233_Fig12_HTML.jpg

相似文献

1
From Revisions to Insights: Converting Radiology Report Revisions into Actionable Educational Feedback Using Generative AI Models.从修订到洞察:利用生成式人工智能模型将放射学报告修订转化为可操作的教育反馈
J Imaging Inform Med. 2025 Apr;38(2):1265-1279. doi: 10.1007/s10278-024-01233-4. Epub 2024 Aug 19.
2
Enhancing radiology training with GPT-4: Pilot analysis of automated feedback in trainee preliminary reports.利用GPT-4加强放射学培训:实习生初步报告中自动反馈的试点分析
Curr Probl Diagn Radiol. 2025 Mar-Apr;54(2):151-158. doi: 10.1067/j.cpradiol.2024.08.003. Epub 2024 Aug 15.
3
Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports.评估生成式预训练变换器4(GPT-4)在规范放射学报告方面的性能。
Eur Radiol. 2024 Jun;34(6):3566-3574. doi: 10.1007/s00330-023-10384-x. Epub 2023 Nov 8.
4
Evaluation of Generative Artificial Intelligence Models in Predicting Pediatric Emergency Severity Index Levels.生成式人工智能模型在预测儿科急诊严重程度指数水平中的评估
Pediatr Emerg Care. 2025 Apr 1;41(4):251-255. doi: 10.1097/PEC.0000000000003315. Epub 2025 Jan 7.
5
Diagnostic Accuracy and Clinical Value of a Domain-specific Multimodal Generative AI Model for Chest Radiograph Report Generation.用于生成胸部X光报告的特定领域多模态生成式人工智能模型的诊断准确性和临床价值
Radiology. 2025 Mar;314(3):e241476. doi: 10.1148/radiol.241476.
6
A Language Model-Powered Simulated Patient With Automated Feedback for History Taking: Prospective Study.基于语言模型的模拟患者与自动化反馈的病史采集:前瞻性研究。
JMIR Med Educ. 2024 Aug 16;10:e59213. doi: 10.2196/59213.
7
Use of ChatGPT Large Language Models to Extract Details of Recommendations for Additional Imaging From Free-Text Impressions of Radiology Reports.使用ChatGPT大型语言模型从放射学报告的自由文本印象中提取额外影像学检查建议的详细信息。
AJR Am J Roentgenol. 2025 Apr;224(4):e2432341. doi: 10.2214/AJR.24.32341. Epub 2025 Jan 29.
8
Evaluation of Large Language Models in Tailoring Educational Content for Cancer Survivors and Their Caregivers: Quality Analysis.大型语言模型在为癌症幸存者及其护理人员量身定制教育内容方面的评估:质量分析
JMIR Cancer. 2025 Apr 7;11:e67914. doi: 10.2196/67914.
9
The Role of Report Comparison, Analysis, and Discrepancy Categorization in Resident Education.报告比较、分析及差异分类在住院医师教育中的作用
AJR Am J Roentgenol. 2016 Dec;207(6):1223-1231. doi: 10.2214/AJR.16.16245. Epub 2016 Sep 22.
10
Artificial Intelligence for Teaching Case Curation: Evaluating Model Performance on Imaging Report Discrepancies.用于教学病例管理的人工智能:评估影像报告差异方面的模型性能
Acad Radiol. 2025 Jun;32(6):3139-3146. doi: 10.1016/j.acra.2025.02.011. Epub 2025 Mar 9.

引用本文的文献

1
The role of generative artificial intelligence in psychiatric education- a scoping review.生成式人工智能在精神科教育中的作用——一项范围综述
BMC Med Educ. 2025 Mar 25;25(1):438. doi: 10.1186/s12909-025-07026-9.
2
Evaluation of radiology residents' reporting skills using large language models: an observational study.使用大语言模型评估放射科住院医师的报告技能:一项观察性研究。
Jpn J Radiol. 2025 Mar 8. doi: 10.1007/s11604-025-01764-y.
3
Generative artificial intelligence in graduate medical education.研究生医学教育中的生成式人工智能。

本文引用的文献

1
Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine.医学领域多模态GPT-4视觉专家级准确性背后的隐藏缺陷。
NPJ Digit Med. 2024 Jul 23;7(1):190. doi: 10.1038/s41746-024-01185-7.
2
Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination.GPT-3.5 和 GPT-4 在放射学 Board 式考试中的可靠性、可重复性、稳健性和置信度评估。
Radiology. 2024 May;311(2):e232715. doi: 10.1148/radiol.232715.
3
Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy.
Front Med (Lausanne). 2025 Jan 10;11:1525604. doi: 10.3389/fmed.2024.1525604. eCollection 2024.
4
Generative AI and large language models in nuclear medicine: current status and future prospects.生成式人工智能和核医学中的大语言模型:现状与未来展望。
Ann Nucl Med. 2024 Nov;38(11):853-864. doi: 10.1007/s12149-024-01981-x. Epub 2024 Sep 25.
GPT-4 在检测放射科报告错误方面的潜力:对报告准确性的影响。
Radiology. 2024 Apr;311(1):e232714. doi: 10.1148/radiol.232714.
4
Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.生成式大型语言模型在放射科报告语音识别错误检测中的应用。
Radiol Artif Intell. 2024 Mar;6(2):e230205. doi: 10.1148/ryai.230205.
5
Chatbots and Large Language Models in Radiology: A Practical Primer for Clinical and Research Applications.放射科中的聊天机器人和大型语言模型:临床和研究应用的实用入门指南。
Radiology. 2024 Jan;310(1):e232756. doi: 10.1148/radiol.232756.
6
Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions.医学影像学中的大语言模型:基础、应用、伦理考量、风险和未来方向。
Diagn Interv Radiol. 2024 Mar 6;30(2):80-90. doi: 10.4274/dir.2023.232417. Epub 2023 Oct 3.
7
Radiology Reading Room for the Future: Harnessing the Power of Large Language Models Like ChatGPT.未来的放射学阅览室:利用ChatGPT等大语言模型的力量
Curr Probl Diagn Radiol. 2023 Aug 30. doi: 10.1067/j.cpradiol.2023.08.018.
8
Potential of ChatGPT and GPT-4 for Data Mining of Free-Text CT Reports on Lung Cancer.ChatGPT 和 GPT-4 在挖掘肺癌 CT 报告自由文本数据方面的潜力
Radiology. 2023 Sep;308(3):e231362. doi: 10.1148/radiol.231362.
9
Feasibility of Differential Diagnosis Based on Imaging Patterns Using a Large Language Model.基于成像模式利用大语言模型进行鉴别诊断的可行性
Radiology. 2023 Jul;308(1):e231167. doi: 10.1148/radiol.231167.
10
Leveraging GPT-4 for Post Hoc Transformation of Free-text Radiology Reports into Structured Reporting: A Multilingual Feasibility Study.利用GPT-4将自由文本放射学报告进行事后转换为结构化报告:一项多语言可行性研究。
Radiology. 2023 May;307(4):e230725. doi: 10.1148/radiol.230725. Epub 2023 Apr 4.