Department of Diagnostic and Interventional Radiology, University of Leipzig, Leipzig, Germany.
J Educ Eval Health Prof. 2024;21:21. doi: 10.3352/jeehp.2024.21.21. Epub 2024 Aug 20.
This study aimed to determine whether ChatGPT-4o, a generative artificial intelligence (AI) platform, was able to pass a simulated written European Board of Interventional Radiology (EBIR) exam and whether GPT-4o can be used to train medical students and interventional radiologists of different levels of expertise by generating exam items on interventional radiology.
GPT-4o was asked to answer 370 simulated exam items of the Cardiovascular and Interventional Radiology Society of Europe (CIRSE) for EBIR preparation (CIRSE Prep). Subsequently, GPT-4o was requested to generate exam items on interventional radiology topics at levels of difficulty suitable for medical students and the EBIR exam. Those generated items were answered by 4 participants, including a medical student, a resident, a consultant, and an EBIR holder. The correctly answered items were counted. One investigator checked the answers and items generated by GPT-4o for correctness and relevance. This work was done from April to July 2024.
GPT-4o correctly answered 248 of the 370 CIRSE Prep items (67.0%). For 50 CIRSE Prep items, the medical student answered 46.0%, the resident 42.0%, the consultant 50.0%, and the EBIR holder 74.0% correctly. All participants answered 82.0% to 92.0% of the 50 GPT-4o generated items at the student level correctly. For the 50 GPT-4o items at the EBIR level, the medical student answered 32.0%, the resident 44.0%, the consultant 48.0%, and the EBIR holder 66.0% correctly. All participants could pass the GPT-4o-generated items for the student level; while the EBIR holder could pass the GPT-4o-generated items for the EBIR level. Two items (0.3%) out of 150 generated by the GPT-4o were assessed as implausible.
GPT-4o could pass the simulated written EBIR exam and create exam items of varying difficulty to train medical students and interventional radiologists.
本研究旨在确定生成式人工智能(AI)平台 ChatGPT-4o 是否能够通过模拟的欧洲介入放射学委员会(EBIR)书面考试,以及 GPT-4o 是否可以通过生成介入放射学考试题目来培训不同专业水平的医学生和介入放射科医生。
要求 GPT-4o 回答 370 项心血管和介入放射学会(CIRSE)为 EBIR 准备的模拟考试题目(CIRSE Prep)。随后,要求 GPT-4o 根据医学生和 EBIR 考试的难易程度生成介入放射学主题的考试题目。由 4 名参与者(包括一名医学生、一名住院医师、一名顾问和一名持有 EBIR 证书的医生)回答这些生成的题目。计算正确回答的题目数量。一名调查员检查 GPT-4o 生成的答案和题目是否正确和相关。这项工作是在 2024 年 4 月至 7 月进行的。
GPT-4o 正确回答了 370 项 CIRSE Prep 题目的 248 项(67.0%)。对于 50 项 CIRSE Prep 题目,医学生的正确回答率为 46.0%,住院医师为 42.0%,顾问为 50.0%,持有 EBIR 证书的医生为 74.0%。所有参与者对 50 项 GPT-4o 生成的学生水平题目正确回答率为 82.0%至 92.0%。对于 50 项 GPT-4o 生成的 EBIR 水平题目,医学生的正确回答率为 32.0%,住院医师为 44.0%,顾问为 48.0%,持有 EBIR 证书的医生为 66.0%。所有参与者都可以通过 GPT-4o 生成的学生水平题目,而持有 EBIR 证书的医生可以通过 GPT-4o 生成的 EBIR 水平题目。GPT-4o 生成的 150 个题目中有 2 个(0.3%)被评估为不合理。
GPT-4o 可以通过模拟的书面 EBIR 考试,并创建不同难度的考试题目来培训医学生和介入放射科医生。