Suppr超能文献

ChatGPT-4:美国医师执照考试中人工智能聊天机器人的升级评估。

ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.

出版信息

Med Teach. 2024 Mar;46(3):366-372. doi: 10.1080/0142159X.2023.2249588. Epub 2023 Oct 15.

Abstract

PURPOSE

ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.

METHOD

Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.

RESULTS

ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04],  = 0.29,  = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28],  = 2.25,  = 0.03).

CONCLUSIONS

ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.

摘要

目的

ChatGPT-4 是一种人工智能聊天机器人的升级版。ChatGPT-4 在 美国医师执照考试(USMLE)上的表现尚未得到独立描述。我们旨在评估 ChatGPT-4 回答 USMLE 第一步、第二步 CK 和第三步练习题的能力。

方法

编制了 USMLE 第一步、第二步 CK 和第三步的练习题。在 376 个可用问题中,ChatGPT-4 在 2023 年 3 月 21 日分析了 319 个(85%)。我们的主要结果是 ChatGPT-4 在 USMLE 第一步、第二步 CK 和第三步练习考试中的表现,以正确回答多项选择题的比例来衡量。我们的次要结果是 ChatGPT-4 提供的问题和回答的平均长度。

结果

ChatGPT-4 回答了来自 USMLE 练习题的 319 个基于文本的多项选择题。ChatGPT-4 在 USMLE 第一步中正确回答了 93 个问题中的 82 个(88%),在第二步 CK 中正确回答了 106 个问题中的 91 个(86%),在第三步中正确回答了 120 个问题中的 108 个(90%)。ChatGPT-4 为所有问题都提供了解释。ChatGPT-4 回答 USMLE 第一步练习题的平均用时为 30.8±11.8 秒,第二步 CK 为 23.0±9.4 秒,第三步为 23.1±8.3 秒。ChatGPT-4 正确回答和错误回答的练习题 USMLE 多项选择题的平均长度相似(差异=17.48 个字符,SE=59.75,95%CI= [-100.09,135.04], =0.29, =0.77)。ChatGPT-4 对练习题正确回答的平均长度明显短于错误回答的平均长度(差异=79.58 个字符,SE=35.42,95%CI= [9.89,149.28], =2.25, =0.03)。

结论

ChatGPT-4 对 USMLE 考试的练习题回答了很高比例的正确答案。ChatGPT-4 在 USMLE 练习题上的表现明显优于同一人工智能聊天机器人的先前模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验