• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

迈向放射学的基础模型?GPT-4V 的多模态和多原子区域能力的定量评估。

Toward Foundation Models in Radiology? Quantitative Assessment of GPT-4V's Multimodal and Multianatomic Region Capabilities.

机构信息

From the Institute of Radiology (Q.D.S., L.S.K., G.N., A.K.M., S.M., I.E., J.R., C.W., C.S., O.W.H., A.S.) and Department of Cranio- and Maxillofacial Surgery (F.N.), University of Regensburg Medical Center, Franz-Josef-Strauss-Allee 11, 93053 Regensburg, Germany; Department of Radiology, Division of Neuroradiology, Massachusetts General Hospital, Harvard Medical School, Boston, Mass (Q.D.S.); Department of Radiology, Bayreuth Medical Center, Bayreuth, Germany (M.S.); Center of Neuroradiology, medbo District Hospital and University Medical Center Regensburg, Regensburg, Germany (I.W., C.W.); and Department of Radiology, Donaustauf Hospital, Donaustauf, Germany (O.W.H.).

出版信息

Radiology. 2024 Nov;313(2):e240955. doi: 10.1148/radiol.240955.

DOI:10.1148/radiol.240955
PMID:39589253
Abstract

Background Large language models have already demonstrated potential in medical text processing. GPT-4V, a large vision-language model from OpenAI, has shown potential for medical imaging, yet a quantitative analysis is lacking. Purpose To quantitatively assess the performance of GPT-4V in interpreting radiologic images using unseen data. Materials and Methods This retrospective study included single representative abnormal and healthy control images from neuroradiology, cardiothoracic radiology, and musculoskeletal radiology (CT, MRI, radiography) to generate reports using GPT-4V via the application programming interface from February to March 2024. The factual correctness of free-text reports and the performance in detecting abnormalities in binary classification tasks were assessed using accuracy, sensitivity, and specificity. The binary classification performance was compared with that of a first-year nonradiologist in training and four board-certified radiologists. Results A total of 515 images in 470 patients (median age, 61 years [IQR, 44-71 years]; 267 male) were included, of which 345 images were abnormal. GPT-4V correctly identified the imaging modality and anatomic region in 100% (515 of 515) and 99.2% (511 of 515) of images, respectively. Diagnostic accuracy in free-text reports was between 0% (0 of 33 images) for pneumothorax (CT and radiography) and 90% (45 of 50 images) for brain tumor (MRI). In binary classification tasks, GPT-4V showed sensitivities between 56% (14 of 25 images) for ischemic stroke and 100% (25 of 25 images) for brain hemorrhage and specificities between 8% (two of 25 images) for brain hemorrhage and 52% (13 of 25 images) for pneumothorax, compared with a pooled sensitivity of 97.2% (1103 of 1135 images) and pooled specificity of 97.2% (1084 of 1115 images) for the human readers across all tasks. The model exhibited a clear tendency to overdiagnose abnormalities, with 86.5% (147 of 170 images) and 67.7% (151 of 223 images) false-positive rates for the free-text and binary classification tasks, respectively. Conclusion GPT-4V, in its earliest version, recognized medical image content and reliably determined the modality and anatomic region from single images. However, GPT-4V failed to detect, classify, or rule out abnormalities in image interpretation. © RSNA, 2024

摘要

背景 大型语言模型已在医学文本处理中显示出潜力。OpenAI 的大型视觉语言模型 GPT-4V 已显示出在医学成像方面的潜力,但缺乏定量分析。目的 定量评估 GPT-4V 在使用未见数据解释放射图像方面的性能。 材料与方法 本回顾性研究纳入了神经放射学、心胸放射学和肌肉骨骼放射学(CT、MRI、X 线摄影)的单个代表性异常和健康对照图像,以便通过应用程序编程接口从 2024 年 2 月至 3 月使用 GPT-4V 生成报告。使用准确性、敏感性和特异性评估自由文本报告的事实正确性和在二进制分类任务中检测异常的性能。将二进制分类性能与第一年的非放射科医师培训和四位 board-certified 放射科医师进行比较。 结果 共纳入 470 例患者的 515 幅图像(中位数年龄,61 岁[IQR,44-71 岁];267 例男性),其中 345 幅图像异常。GPT-4V 分别正确识别了 100%(515 幅中的 515 幅)和 99.2%(511 幅中的 515 幅)的成像方式和解剖区域。自由文本报告的诊断准确性在气胸(CT 和 X 线摄影)的 0%(33 幅图像中的 0 幅)和脑肿瘤(MRI)的 90%(50 幅图像中的 45 幅)之间。在二进制分类任务中,GPT-4V 的敏感性分别为 56%(25 幅图像中的 14 幅)和 100%(25 幅图像中的 25 幅),特异性分别为 8%(25 幅图像中的 2 幅)和 52%(25 幅图像中的 13 幅),而人类读者在所有任务中的 pooled 敏感性为 97.2%(1135 幅图像中的 1103 幅)和 pooled 特异性为 97.2%(1115 幅图像中的 1084 幅)。该模型表现出明显的过度诊断异常的趋势,自由文本和二进制分类任务的假阳性率分别为 86.5%(170 幅图像中的 147 幅)和 67.7%(223 幅图像中的 151 幅)。 结论 GPT-4V 的早期版本能够识别医学图像内容,并可靠地从单张图像中确定成像方式和解剖区域。然而,GPT-4V 在图像解释中未能检测、分类或排除异常。 © RSNA,2024

相似文献

1
Toward Foundation Models in Radiology? Quantitative Assessment of GPT-4V's Multimodal and Multianatomic Region Capabilities.迈向放射学的基础模型?GPT-4V 的多模态和多原子区域能力的定量评估。
Radiology. 2024 Nov;313(2):e240955. doi: 10.1148/radiol.240955.
2
Unveiling GPT-4V's hidden challenges behind high accuracy on USMLE questions: Observational Study.揭示GPT-4V在美国医师执照考试(USMLE)问题上高精度背后的隐藏挑战:观察性研究。
J Med Internet Res. 2025 Feb 7;27:e65146. doi: 10.2196/65146.
3
Assessing GPT-4 multimodal performance in radiological image analysis.评估GPT-4在放射图像分析中的多模态性能。
Eur Radiol. 2025 Apr;35(4):1959-1965. doi: 10.1007/s00330-024-11035-5. Epub 2024 Aug 30.
4
Evaluation of GPT Large Language Model Performance on RSNA 2023 Case of the Day Questions.评估 GPT 大语言模型在 RSNA 2023 每日病例问题上的表现。
Radiology. 2024 Oct;313(1):e240609. doi: 10.1148/radiol.240609.
5
Evaluating GPT-V4 (GPT-4 with Vision) on Detection of Radiologic Findings on Chest Radiographs.评估 GPT-V4(具有视觉功能的 GPT-4)在检测胸部 X 光片中放射学发现的能力。
Radiology. 2024 May;311(2):e233270. doi: 10.1148/radiol.233270.
6
ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology.与放射科医生在肌肉骨骼放射学中的诊断表现相比,基于文本与视觉信息的ChatGPT的诊断表现。
Eur Radiol. 2025 Jan;35(1):506-516. doi: 10.1007/s00330-024-10902-5. Epub 2024 Jul 12.
7
Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.GPT-4 在基于文本和图像的放射科住院医师诊断考试中的表现。
Radiology. 2024 Sep;312(3):e240153. doi: 10.1148/radiol.240153.
8
Comparing Diagnostic Accuracy of Radiologists versus GPT-4V and Gemini Pro Vision Using Image Inputs from Diagnosis Please Cases.比较放射科医生与 GPT-4V 和 Gemini Pro Vision 使用诊断请案例的图像输入的诊断准确性。
Radiology. 2024 Jul;312(1):e240273. doi: 10.1148/radiol.240273.
9
Glaucoma Detection and Feature Identification via GPT-4V Fundus Image Analysis.通过GPT-4V眼底图像分析进行青光眼检测与特征识别
Ophthalmol Sci. 2024 Nov 29;5(2):100667. doi: 10.1016/j.xops.2024.100667. eCollection 2025 Mar-Apr.
10
GPT-4 Vision: Multi-Modal Evolution of ChatGPT and Potential Role in Radiology.GPT-4视觉:ChatGPT的多模态演进及其在放射学中的潜在作用。
Cureus. 2024 Aug 31;16(8):e68298. doi: 10.7759/cureus.68298. eCollection 2024 Aug.

引用本文的文献

1
Current Landscape and Future Directions Regarding Generative Large Language Models in Stroke Care: Scoping Review.中风护理中生成式大语言模型的当前现状与未来方向:范围综述
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
2
Do LLMs Have 'the Eye' for MRI? Evaluating GPT-4o, Grok, and Gemini on Brain MRI Performance: First Evaluation of Grok in Medical Imaging and a Comparative Analysis.大型语言模型对磁共振成像有“洞察力”吗?评估GPT-4o、Grok和Gemini在脑部磁共振成像性能方面的表现:Grok在医学成像中的首次评估及比较分析
Diagnostics (Basel). 2025 May 24;15(11):1320. doi: 10.3390/diagnostics15111320.
3
Large language models for dermatological image interpretation - a comparative study.
用于皮肤病图像解读的大语言模型——一项比较研究。
Diagnosis (Berl). 2025 May 23. doi: 10.1515/dx-2025-0014.
4
Radiology AI and sustainability paradox: environmental, economic, and social dimensions.放射学人工智能与可持续性悖论:环境、经济和社会维度
Insights Imaging. 2025 Apr 17;16(1):88. doi: 10.1186/s13244-025-01962-2.