Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.
Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.
Med Teach. 2024 Mar;46(3):366-372. doi: 10.1080/0142159X.2023.2249588. Epub 2023 Oct 15.
ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.
Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.
ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], = 0.29, = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], = 2.25, = 0.03).
ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.
ChatGPT-4 是一种人工智能聊天机器人的升级版。ChatGPT-4 在 美国医师执照考试(USMLE)上的表现尚未得到独立描述。我们旨在评估 ChatGPT-4 回答 USMLE 第一步、第二步 CK 和第三步练习题的能力。
编制了 USMLE 第一步、第二步 CK 和第三步的练习题。在 376 个可用问题中,ChatGPT-4 在 2023 年 3 月 21 日分析了 319 个(85%)。我们的主要结果是 ChatGPT-4 在 USMLE 第一步、第二步 CK 和第三步练习考试中的表现,以正确回答多项选择题的比例来衡量。我们的次要结果是 ChatGPT-4 提供的问题和回答的平均长度。
ChatGPT-4 回答了来自 USMLE 练习题的 319 个基于文本的多项选择题。ChatGPT-4 在 USMLE 第一步中正确回答了 93 个问题中的 82 个(88%),在第二步 CK 中正确回答了 106 个问题中的 91 个(86%),在第三步中正确回答了 120 个问题中的 108 个(90%)。ChatGPT-4 为所有问题都提供了解释。ChatGPT-4 回答 USMLE 第一步练习题的平均用时为 30.8±11.8 秒,第二步 CK 为 23.0±9.4 秒,第三步为 23.1±8.3 秒。ChatGPT-4 正确回答和错误回答的练习题 USMLE 多项选择题的平均长度相似(差异=17.48 个字符,SE=59.75,95%CI= [-100.09,135.04], =0.29, =0.77)。ChatGPT-4 对练习题正确回答的平均长度明显短于错误回答的平均长度(差异=79.58 个字符,SE=35.42,95%CI= [9.89,149.28], =2.25, =0.03)。
ChatGPT-4 对 USMLE 考试的练习题回答了很高比例的正确答案。ChatGPT-4 在 USMLE 练习题上的表现明显优于同一人工智能聊天机器人的先前模型。