文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

评估 GPT-4o 在欧洲放射学委员会官方考试中的表现:全面评估。

Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment.

机构信息

Department of Radiology, Kahramanmaraş Necip Fazıl City Hospital, Kahramanmaraş Necip Fazıl Şehir Hastanesi, Kahramanmaraş 46050, Turkey (M.S.B.).

Department of Radiology, Hospital Clínic de Barcelona, C. de Villarroel, 170, Barcelona 08036, Spain (L.O.).

出版信息

Acad Radiol. 2024 Nov;31(11):4365-4371. doi: 10.1016/j.acra.2024.09.005. Epub 2024 Sep 18.


DOI:10.1016/j.acra.2024.09.005
PMID:39294055
Abstract

RATIONALE AND OBJECTIVES: This study aims to evaluate the performance of generative pre-trained transformer (GPT)-4o in the complete official European Board of Radiology (EBR) exam, designed to assess radiology knowledge, skills, and competence. MATERIALS AND METHODS: Questions based on text, image, or video and in the format of multiple choice, free-text reporting, or image annotation were uploaded into GPT-4o using standardized prompting. The results were compared to the average scores of radiologists taking the exam in real time. RESULTS: In Part 1 (multiple response questions and short cases), GPT-4o outperformed both the radiologists' average scores and the maximum pass score (70.2% vs. 58.4% and 60%, respectively). In Part 2 (clinically oriented reasoning evaluation), the performance of GPT-4o was below both the radiologists' average scores and the minimum pass score (52.9% vs. 66.1% and 55%, respectively). The accuracy on questions involving ultrasound images was higher compared to other imaging modalities (accuracy rate, 87.5-100%). For video-based questions, the performance was 50.6%. The model achieved the highest accuracy on most likely diagnosis questions but showed lower accuracy in free-text reporting and direct anatomical assessment in images (100% vs. 31% and 28.6%, respectively). CONCLUSION: The abilities of GPT-4o in the official EBR exam are particularly noteworthy. This study demonstrates the potential of large language models to assist radiologists in assessing and managing cases from diagnosis to treatment or follow-up recommendations, even with zero-shot prompting.

摘要

背景与目的:本研究旨在评估生成式预训练转换器(GPT-4o)在完整的欧洲放射学会(EBR)官方考试中的表现,该考试旨在评估放射学知识、技能和能力。

材料与方法:基于文本、图像或视频的问题以及多选题、自由文本报告或图像注释的形式,使用标准化提示上传到 GPT-4o。结果与实时参加考试的放射科医生的平均分数进行比较。

结果:在第一部分(多项选择题和短篇病例)中,GPT-4o 的表现优于放射科医生的平均分数和最高及格分数(分别为 70.2%比 58.4%和 60%)。在第二部分(面向临床推理评估)中,GPT-4o 的表现低于放射科医生的平均分数和最低及格分数(分别为 52.9%比 66.1%和 55%)。与其他影像学模式相比,GPT-4o 在涉及超声图像的问题上的准确率更高(准确率,87.5-100%)。对于基于视频的问题,性能为 50.6%。该模型在最可能的诊断问题上取得了最高的准确率,但在自由文本报告和图像中的直接解剖评估方面的准确率较低(分别为 100%比 31%和 28.6%)。

结论:GPT-4o 在 EBR 官方考试中的能力特别值得注意。本研究表明,大型语言模型有可能帮助放射科医生评估和管理从诊断到治疗或随访建议的病例,甚至可以进行零样本提示。

相似文献

[1]
Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment.

Acad Radiol. 2024-11

[2]
GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.

J Educ Eval Health Prof. 2024

[3]
Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations.

Jpn J Radiol. 2024-12

[4]
GPT-4 Turbo with Vision fails to outperform text-only GPT-4 Turbo in the Japan Diagnostic Radiology Board Examination.

Jpn J Radiol. 2024-8

[5]
Large Language Models as Tools to Generate Radiology Board-Style Multiple-Choice Questions.

Acad Radiol. 2024-9

[6]
Performance of GPT-4 with Vision on Text- and Image-based ACR Diagnostic Radiology In-Training Examination Questions.

Radiology. 2024-9

[7]
Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists.

J Clin Densitom. 2024

[8]
Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam.

medRxiv. 2024-7-16

[9]
Generative pretrained transformer-4, an artificial intelligence text predictive model, has a high capability for passing novel written radiology exam questions.

Int J Comput Assist Radiol Surg. 2024-4

[10]
GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination.

Cureus. 2024-9-6

引用本文的文献

[1]
Effectiveness of the GPT-4o Model in Interpreting Electrocardiogram Images for Cardiac Diagnostics: Diagnostic Accuracy Study.

JMIR AI. 2025-8-22

[2]
Large language models for efficient whole-organ MRI score-based reports and categorization in knee osteoarthritis.

Insights Imaging. 2025-5-14

[3]
Extracting Pulmonary Embolism Diagnoses From Radiology Impressions Using GPT-4o: Large Language Model Evaluation Study.

JMIR Med Inform. 2025-4-9

[4]
Performance Evaluation of GPT-4o and o1-Preview Using the Certification Examination for the Japanese 'Operations Chief of Radiography With X-rays'.

Cureus. 2024-11-22

[5]
Assessing the ability of GPT-4o to visually recognize medications and provide patient education.

Sci Rep. 2024-11-5

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索