GPT-4o 在回答模拟的欧洲介入放射学委员会考试方面的能力与德国医学生和专家相比，以及其在介入放射学方面生成考试项目的能力：一项描述性研究。

GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.

机构信息

Department of Diagnostic and Interventional Radiology, University of Leipzig, Leipzig, Germany.

出版信息

J Educ Eval Health Prof. 2024;21:21. doi: 10.3352/jeehp.2024.21.21. Epub 2024 Aug 20.

DOI:10.3352/jeehp.2024.21.21

PMID:39161266

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11894030/

Abstract

PURPOSE

This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.

METHODS

GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.

RESULTS

GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.

CONCLUSION

GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.

摘要

目的

本研究旨在确定生成式人工智能（AI）平台 ChatGPT-4o 是否能够通过模拟的欧洲介入放射学委员会（EBIR）书面考试，以及 GPT-4o 是否可以通过生成介入放射学考试题目来培训不同专业水平的医学生和介入放射科医生。

方法

要求 GPT-4o 回答 370 项心血管和介入放射学会（CIRSE）为 EBIR 准备的模拟考试题目（CIRSE Prep）。随后，要求 GPT-4o 根据医学生和 EBIR 考试的难易程度生成介入放射学主题的考试题目。由 4 名参与者（包括一名医学生、一名住院医师、一名顾问和一名持有 EBIR 证书的医生）回答这些生成的题目。计算正确回答的题目数量。一名调查员检查 GPT-4o 生成的答案和题目是否正确和相关。这项工作是在 2024 年 4 月至 7 月进行的。

结果

GPT-4o 正确回答了 370 项 CIRSE Prep 题目的 248 项（67.0%）。对于 50 项 CIRSE Prep 题目，医学生的正确回答率为 46.0%，住院医师为 42.0%，顾问为 50.0%，持有 EBIR 证书的医生为 74.0%。所有参与者对 50 项 GPT-4o 生成的学生水平题目正确回答率为 82.0%至 92.0%。对于 50 项 GPT-4o 生成的 EBIR 水平题目，医学生的正确回答率为 32.0%，住院医师为 44.0%，顾问为 48.0%，持有 EBIR 证书的医生为 66.0%。所有参与者都可以通过 GPT-4o 生成的学生水平题目，而持有 EBIR 证书的医生可以通过 GPT-4o 生成的 EBIR 水平题目。GPT-4o 生成的 150 个题目中有 2 个（0.3%）被评估为不合理。

结论

GPT-4o 可以通过模拟的书面 EBIR 考试，并创建不同难度的考试题目来培训医学生和介入放射科医生。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/303c/11894030/52aefe1d413f/jeehp-21-21f1.jpg

相似文献

GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.GPT-4o 在回答模拟的欧洲介入放射学委员会考试方面的能力与德国医学生和专家相比，以及其在介入放射学方面生成考试项目的能力：一项描述性研究。

J Educ Eval Health Prof. 2024;21:21. doi: 10.3352/jeehp.2024.21.21. Epub 2024 Aug 20.

Evaluating GPT-4o's Performance in the Official European Board of Radiology Exam: A Comprehensive Assessment.评估 GPT-4o 在欧洲放射学委员会官方考试中的表现：全面评估。

Acad Radiol. 2024 Nov;31(11):4365-4371. doi: 10.1016/j.acra.2024.09.005. Epub 2024 Sep 18.

Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o（GPT-4o）用于解答欧洲放射学文凭（EDiR）基于文本的多项选择题：与放射科医生的对比研究

Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.

Evaluating AI proficiency in nuclear cardiology: Large language models take on the board preparation exam.评估人工智能在核心脏病学方面的熟练程度：大型语言模型参加资格考试。

J Nucl Cardiol. 2025 Mar;45:102089. doi: 10.1016/j.nuclcard.2024.102089. Epub 2024 Nov 29.

ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis.ChatGPT-4 在 USMLE 学科和临床技能中的全能表现：比较分析。

JMIR Med Educ. 2024 Nov 6;10:e63430. doi: 10.2196/63430.

GPT-4o vs. Human Candidates: Performance Analysis in the Polish Final Dentistry Examination.GPT-4o与人类考生：波兰牙科最终考试中的表现分析

Cureus. 2024 Sep 6;16(9):e68813. doi: 10.7759/cureus.68813. eCollection 2024 Sep.

Performance of Progressive Generations of GPT on an Exam Designed for Certifying Physicians as Certified Clinical Densitometrists.GPT 各代产品在专为认证医师为认证临床骨密度技师而设计的考试中的表现。

J Clin Densitom. 2024 Apr-Jun;27(2):101480. doi: 10.1016/j.jocd.2024.101480. Epub 2024 Feb 17.

Could ChatGPT Pass the UK Radiology Fellowship Examinations?ChatGPT 能通过英国放射科医师研究员考试吗？

Acad Radiol. 2024 May;31(5):2178-2182. doi: 10.1016/j.acra.2023.11.026. Epub 2023 Dec 29.

Assessing ChatGPT for Clinical Decision-Making in Radiation Oncology, With Open-Ended Questions and Images.通过开放式问题和图像评估ChatGPT在放射肿瘤学临床决策中的应用

Pract Radiat Oncol. 2025 Apr 29. doi: 10.1016/j.prro.2025.04.009.

Diagnostic accuracy of vision-language models on Japanese diagnostic radiology, nuclear medicine, and interventional radiology specialty board examinations.视觉语言模型在日本放射诊断学、核医学和介入放射学专业委员会考试中的诊断准确性。

Jpn J Radiol. 2024 Dec;42(12):1392-1398. doi: 10.1007/s11604-024-01633-0. Epub 2024 Jul 20.

引用本文的文献

Evaluating Large Language Models for Preoperative Patient Education in Superior Capsular Reconstruction: Comparative Study of Claude, GPT, and Gemini.评估大语言模型在肩胛下肌上囊重建术前患者教育中的应用：Claude、GPT和Gemini的比较研究

JMIR Perioper Med. 2025 Jun 12;8:e70047. doi: 10.2196/70047.

Evaluating the performance of GPT-3.5, GPT-4, and GPT-4o in the Chinese National Medical Licensing Examination.评估GPT-3.5、GPT-4和GPT-4o在中国国家医师资格考试中的表现。

Sci Rep. 2025 Apr 23;15(1):14119. doi: 10.1038/s41598-025-98949-2.

Preliminary assessment of large language models' performance in answering questions on developmental dysplasia of the hip.大语言模型在回答有关发育性髋关节发育不良问题时的性能初步评估。

J Child Orthop. 2025 Apr 15:18632521251331772. doi: 10.1177/18632521251331772.

AI and Interventional Radiology: A Narrative Review of Reviews on Opportunities, Challenges, and Future Directions.人工智能与介入放射学：关于机遇、挑战及未来方向的综述之叙述性综述

Diagnostics (Basel). 2025 Apr 1;15(7):893. doi: 10.3390/diagnostics15070893.

Evaluating the performance of ChatGPT in patient consultation and image-based preliminary diagnosis in thyroid eye disease.评估ChatGPT在甲状腺眼病患者咨询及基于图像的初步诊断中的表现。

Front Med (Lausanne). 2025 Feb 18;12:1546706. doi: 10.3389/fmed.2025.1546706. eCollection 2025.

本文引用的文献

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.人工智能驱动的聊天机器人在回答骨科研究生考试问题中的有效性——一项观察性研究。

Int Orthop. 2024 Aug;48(8):1963-1969. doi: 10.1007/s00264-024-06182-9. Epub 2024 Apr 15.

ChatGPT performance on the American Shoulder and Elbow Surgeons maintenance of certification exam.ChatGPT 在美肩肘外科医生认证考试维护部分的表现。

J Shoulder Elbow Surg. 2024 Sep;33(9):1888-1893. doi: 10.1016/j.jse.2024.02.029. Epub 2024 Apr 4.

GPT-4's Performance on the European Board of Interventional Radiology Sample Questions.GPT-4在欧洲介入放射学会样题上的表现。

Cardiovasc Intervent Radiol. 2024 May;47(5):683-684. doi: 10.1007/s00270-024-03711-2. Epub 2024 Mar 26.

Can AI pass the written European Board Examination in Neurological Surgery? - Ethical and practical issues.人工智能能否通过欧洲神经外科书面考试？——伦理与实际问题。

Brain Spine. 2024 Feb 13;4:102765. doi: 10.1016/j.bas.2024.102765. eCollection 2024.

Provision of Interventional Radiology Services 2023.2023年介入放射学服务的提供

Cardiovasc Intervent Radiol. 2024 Jan;47(1):3-25. doi: 10.1007/s00270-023-03600-0. Epub 2023 Nov 17.

Locoregional Perspectives/Challenges for Interventional Radiology Practice in the UK.英国介入放射学实践的局部区域视角/挑战

Cardiovasc Intervent Radiol. 2022 Oct;45(10):1561-1562. doi: 10.1007/s00270-022-03142-x. Epub 2022 Apr 13.

EBIR-Helping to Foster Global IR.EBIR助力促进全球介入放射学发展。

Cardiovasc Intervent Radiol. 2022 Oct;45(10):1553-1554. doi: 10.1007/s00270-022-03138-7. Epub 2022 Apr 12.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GPT-4o 在回答模拟的欧洲介入放射学委员会考试方面的能力与德国医学生和专家相比，以及其在介入放射学方面生成考试项目的能力：一项描述性研究。

GPT-4o’s competency in answering the simulated written European Board of Interventional Radiology exam compared to a medical student and experts in Germany and its ability to generate exam items on interventional radiology: a descriptive study.

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献