• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GPT-4作为头部CT报告校对工具可行性的大规模验证

Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports.

作者信息

Kim Songsoo, Kim Donghyun, Shin Hyun Joo, Lee Seung Hyun, Kang Yeseul, Jeong Sejin, Kim Jaewoong, Han Miran, Lee Seong-Joon, Kim Joonho, Yum Jungyon, Han Changho, Yoon Dukyong

机构信息

From the Departments of Biomedical Systems Informatics (S.K., Jaewoong Kim, C.H., D.Y.) and Neurology (Joonho Kim, J.Y.), Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul 03722, Republic of Korea; Department of Radiology, Central Draft Physical Examination Office of Military Manpower Administration, Daegu, Republic of Korea (D.K.); Department of Radiology, Research Institute of Radiological Science and Center for Clinical Imaging Data Science (H.J.S. Y.K., S.J.), and Center for Digital Health (H.J.S., D.Y.), Yongin Severance Hospital, Yonsei University College of Medicine, Yongin, Republic of Korea; Department of Radiology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea (S.H.L.); Departments of Radiology (M.H.) and Neurology (S.J.L.), Ajou University Hospital, Ajou University School of Medicine, Suwon, Republic of Korea; and Institute for Innovation in Digital Healthcare, Severance Hospital, Seoul, Republic of Korea (D.Y.).

出版信息

Radiology. 2025 Jan;314(1):e240701. doi: 10.1148/radiol.240701.

DOI:10.1148/radiol.240701
PMID:39873601
Abstract

Background The increasing workload of radiologists can lead to burnout and errors in radiology reports. Large language models, such as OpenAI's GPT-4, hold promise as error revision tools for radiology. Purpose To test the feasibility of GPT-4 use by determining its error detection, reasoning, and revision performance on head CT reports with varying error types and to validate its clinical utility by comparison with human readers. Materials and Methods A total of 10 300 head CT reports were retrospectively extracted from the Medical Information Mart for Intensive Care III public dataset. In experiment 1, among the 300 unaltered reports and 300 versions with applied errors, GPT-4 optimization was initially conducted with 200 reports. The remaining 400 were used for evaluation of error type detection, reasoning, and revision, as well as the analysis of reports with undetected errors. The performance was also compared with that of human readers. In experiment 2, the detection performance of GPT-4 was validated on 10 000 unaltered reports that were deemed error-free by physicians, and an analysis of false-positive results was conducted. A permutation test was conducted to assess differences in performance. Results GPT-4 demonstrated commendable performance in error detection (sensitivity, 84% for interpretive error and 89% for factual error), reasoning, and revision. Compared with GPT-4, human readers had worse factual error detection sensitivity (0.33-0.69 vs 0.89; = .008 for radiologist 4, < .001 for others) and took longer to review (82-121 seconds vs 16 seconds, < .001). In 10 000 reports, GPT-4 detected 96 errors, with a low positive predictive value of 0.05, yet 14% of the false-positive responses were potentially beneficial. Conclusion GPT-4 effectively detects, reasons, and revises errors in radiology reports. While it shows excellent performance in identifying factual errors, its ability to prioritize clinically significant findings is limited. Recognizing its strengths and limitations, GPT-4 could serve as a feasible tool. © RSNA, 2025 See also the editorial by Choi in this issue.

摘要

背景 放射科医生工作量的增加可能导致职业倦怠和放射学报告中的错误。大型语言模型,如OpenAI的GPT-4,有望成为放射学报告的错误修订工具。目的 通过确定GPT-4在不同错误类型的头部CT报告上的错误检测、推理和修订性能来测试其使用的可行性,并通过与人类读者比较来验证其临床实用性。材料与方法 从重症监护医学信息库III公共数据集中回顾性提取了总共10300份头部CT报告。在实验1中,在300份未改动的报告和300份有故意错误的报告中,最初用200份报告对GPT-4进行优化。其余400份用于评估错误类型检测、推理和修订,以及对未检测到错误的报告进行分析。其性能也与人类读者的性能进行了比较。在实验2中,在10000份被医生判定为无错误的未改动报告上验证了GPT-4的检测性能,并对假阳性结果进行了分析。进行了排列检验以评估性能差异。结果 GPT-4在错误检测(解释性错误的灵敏度为84%,事实性错误的灵敏度为89%)、推理和修订方面表现出值得称赞的性能。与GPT-4相比,人类读者的事实性错误检测灵敏度较差(放射科医生4为0.33 - 0.69对0.89;P = 0.008,其他人为P < 0.001),且审查时间更长(82 - 121秒对16秒,P < 0.001)。在10000份报告中,GPT-4检测到96个错误,阳性预测值较低,为0.05,但14%的假阳性反应可能有益。结论 GPT-4能有效检测、推理和修订放射学报告中的错误。虽然它在识别事实性错误方面表现出色,但其对临床重要发现进行优先级排序的能力有限。认识到其优势和局限性,GPT-4可作为一种可行的工具。© RSNA, 2025 另见本期Choi的社论。

相似文献

1
Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports.GPT-4作为头部CT报告校对工具可行性的大规模验证
Radiology. 2025 Jan;314(1):e240701. doi: 10.1148/radiol.240701.
2
Potential of GPT-4 for Detecting Errors in Radiology Reports: Implications for Reporting Accuracy.GPT-4 在检测放射科报告错误方面的潜力:对报告准确性的影响。
Radiology. 2024 Apr;311(1):e232714. doi: 10.1148/radiol.232714.
3
Multilingual feasibility of GPT-4o for automated Voice-to-Text CT and MRI report transcription.GPT-4o用于自动语音转文本CT和MRI报告转录的多语言可行性。
Eur J Radiol. 2025 Jan;182:111827. doi: 10.1016/j.ejrad.2024.111827. Epub 2024 Nov 17.
4
Lung Cancer Staging Using Chest CT and FDG PET/CT Free-Text Reports: Comparison Among Three ChatGPT Large Language Models and Six Human Readers of Varying Experience.使用胸部CT和FDG PET/CT自由文本报告进行肺癌分期:三种ChatGPT大语言模型与六位不同经验水平的人类读者的比较
AJR Am J Roentgenol. 2024 Dec;223(6):e2431696. doi: 10.2214/AJR.24.31696. Epub 2024 Sep 4.
5
Evaluating the Performance and Bias of Natural Language Processing Tools in Labeling Chest Radiograph Reports.评估自然语言处理工具在标注胸部 X 光报告中的性能和偏差。
Radiology. 2024 Oct;313(1):e232746. doi: 10.1148/radiol.232746.
6
Large Language Models for Automated Synoptic Reports and Resectability Categorization in Pancreatic Cancer.大语言模型在胰腺肿瘤自动化综述报告和可切除性分类中的应用。
Radiology. 2024 Jun;311(3):e233117. doi: 10.1148/radiol.233117.
7
Privacy-ensuring Open-weights Large Language Models Are Competitive with Closed-weights GPT-4o in Extracting Chest Radiography Findings from Free-Text Reports.在从自由文本报告中提取胸部X光检查结果方面,确保隐私的开放权重大型语言模型与封闭权重的GPT-4o具有竞争力。
Radiology. 2025 Jan;314(1):e240895. doi: 10.1148/radiol.240895.
8
Generative Large Language Models for Detection of Speech Recognition Errors in Radiology Reports.生成式大型语言模型在放射科报告语音识别错误检测中的应用。
Radiol Artif Intell. 2024 Mar;6(2):e230205. doi: 10.1148/ryai.230205.
9
Large language models for error detection in radiology reports: a comparative analysis between closed-source and privacy-compliant open-source models.用于放射学报告错误检测的大语言模型:闭源模型与符合隐私规定的开源模型的对比分析
Eur Radiol. 2025 Feb 20. doi: 10.1007/s00330-025-11438-y.
10
The Treasure Trove Hidden in Plain Sight: The Utility of GPT-4 in Chest Radiograph Evaluation.藏于明处的宝库:GPT-4 在胸片评估中的应用。
Radiology. 2024 Nov;313(2):e233441. doi: 10.1148/radiol.233441.

引用本文的文献

1
In-Context Learning with Large Language Models: A Simple and Effective Approach to Improve Radiology Report Labeling.利用大语言模型进行上下文学习:一种改进放射学报告标注的简单有效方法。
Healthc Inform Res. 2025 Jul;31(3):295-309. doi: 10.4258/hir.2025.31.3.295. Epub 2025 Jul 31.
2
Large Language Models in Cancer Imaging: Applications and Future Perspectives.癌症成像中的大语言模型:应用与未来展望。
J Clin Med. 2025 May 8;14(10):3285. doi: 10.3390/jcm14103285.