Suppr超能文献

从修订到洞察:利用生成式人工智能模型将放射学报告修订转化为可操作的教育反馈

From Revisions to Insights: Converting Radiology Report Revisions into Actionable Educational Feedback Using Generative AI Models.

作者信息

Lyo Shawn, Mohan Suyash, Hassankhani Alvand, Noor Abass, Dako Farouk, Cook Tessa

机构信息

Department of Radiology, Hospital of the University of Pennsylvania, Philadelphia, PA, USA.

出版信息

J Imaging Inform Med. 2025 Apr;38(2):1265-1279. doi: 10.1007/s10278-024-01233-4. Epub 2024 Aug 19.

Abstract

Expert feedback on trainees' preliminary reports is crucial for radiologic training, but real-time feedback can be challenging due to non-contemporaneous, remote reading and increasing imaging volumes. Trainee report revisions contain valuable educational feedback, but synthesizing data from raw revisions is challenging. Generative AI models can potentially analyze these revisions and provide structured, actionable feedback. This study used the OpenAI GPT-4 Turbo API to analyze paired synthesized and open-source analogs of preliminary and finalized reports, identify discrepancies, categorize their severity and type, and suggest review topics. Expert radiologists reviewed the output by grading discrepancies, evaluating the severity and category accuracy, and suggested review topic relevance. The reproducibility of discrepancy detection and maximal discrepancy severity was also examined. The model exhibited high sensitivity, detecting significantly more discrepancies than radiologists (W = 19.0, p < 0.001) with a strong positive correlation (r = 0.778, p < 0.001). Interrater reliability for severity and type were fair (Fleiss' kappa = 0.346 and 0.340, respectively; weighted kappa = 0.622 for severity). The LLM achieved a weighted F1 score of 0.66 for severity and 0.64 for type. Generated teaching points were considered relevant in ~ 85% of cases, and relevance correlated with the maximal discrepancy severity (Spearman ρ = 0.76, p < 0.001). The reproducibility was moderate to good (ICC (2,1) = 0.690) for the number of discrepancies and substantial for maximal discrepancy severity (Fleiss' kappa = 0.718; weighted kappa = 0.94). Generative AI models can effectively identify discrepancies in report revisions and generate relevant educational feedback, offering promise for enhancing radiology training.

摘要

专家对实习医生初步报告的反馈对放射学培训至关重要,但由于非同步、远程阅片以及成像量不断增加,实时反馈可能具有挑战性。实习医生报告的修订包含有价值的教育反馈,但从原始修订中综合数据具有挑战性。生成式人工智能模型有可能分析这些修订并提供结构化的、可操作的反馈。本研究使用OpenAI GPT-4 Turbo API来分析初步报告和最终报告的配对合成模拟物和开源类似物,识别差异,对其严重程度和类型进行分类,并提出审查主题。放射学专家通过对差异进行评分、评估严重程度和类别准确性以及建议审查主题相关性来审查输出结果。还检查了差异检测的可重复性和最大差异严重程度。该模型表现出高灵敏度,检测到的差异明显多于放射科医生(W = 19.0,p < 0.001),且具有很强的正相关性(r = 0.778,p < 0.001)。严重程度和类型的评分者间信度一般(Fleiss' kappa分别为0.346和0.340;严重程度的加权kappa为0.622)。大语言模型在严重程度方面的加权F1得分为0.66,在类型方面为0.64。生成的教学要点在约85%的案例中被认为是相关的,相关性与最大差异严重程度相关(Spearman ρ = 0.76,p < 0.001)。差异数量的可重复性中等至良好(ICC(2,1)=0.690),最大差异严重程度的可重复性较高(Fleiss' kappa = 0.718;加权kappa = 0.94)。生成式人工智能模型可以有效地识别报告修订中的差异并生成相关的教育反馈,为加强放射学培训带来了希望。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8cdc/11950553/6ee23e5f8e85/10278_2024_1233_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验