ChatGPT-4：美国医师执照考试中人工智能聊天机器人的升级评估。

ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.

机构信息

Temerty Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada.

Department of Ophthalmology and Vision Sciences, University of Toronto, Toronto, Ontario, Canada.

出版信息

Med Teach. 2024 Mar;46(3):366-372. doi: 10.1080/0142159X.2023.2249588. Epub 2023 Oct 15.

DOI:10.1080/0142159X.2023.2249588

PMID:37839017

Abstract

PURPOSE

ChatGPT-4 is an upgraded version of an artificial intelligence chatbot. The performance of ChatGPT-4 on the United States Medical Licensing Examination (USMLE) has not been independently characterized. We aimed to assess the performance of ChatGPT-4 at responding to USMLE Step 1, Step 2CK, and Step 3 practice questions.

METHOD

Practice multiple-choice questions for the USMLE Step 1, Step 2CK, and Step 3 were compiled. Of 376 available questions, 319 (85%) were analyzed by ChatGPT-4 on March 21, 2023. Our primary outcome was the performance of ChatGPT-4 for the practice USMLE Step 1, Step 2CK, and Step 3 examinations, measured as the proportion of multiple-choice questions answered correctly. Our secondary outcomes were the mean length of questions and responses provided by ChatGPT-4.

RESULTS

ChatGPT-4 responded to 319 text-based multiple-choice questions from USMLE practice test material. ChatGPT-4 answered 82 of 93 (88%) questions correctly on USMLE Step 1, 91 of 106 (86%) on Step 2CK, and 108 of 120 (90%) on Step 3. ChatGPT-4 provided explanations for all questions. ChatGPT-4 spent 30.8 ± 11.8 s on average responding to practice questions for USMLE Step 1, 23.0 ± 9.4 s per question for Step 2CK, and 23.1 ± 8.3 s per question for Step 3. The mean length of practice USMLE multiple-choice questions that were answered correctly and incorrectly by ChatGPT-4 was similar (difference = 17.48 characters, SE = 59.75, 95%CI = [-100.09,135.04], = 0.29, = 0.77). The mean length of ChatGPT-4's correct responses to practice questions was significantly shorter than the mean length of incorrect responses (difference = 79.58 characters, SE = 35.42, 95%CI = [9.89,149.28], = 2.25, = 0.03).

CONCLUSIONS

ChatGPT-4 answered a remarkably high proportion of practice questions correctly for USMLE examinations. ChatGPT-4 performed substantially better at USMLE practice questions than previous models of the same AI chatbot.

摘要

目的

ChatGPT-4 是一种人工智能聊天机器人的升级版。ChatGPT-4 在美国医师执照考试（USMLE）上的表现尚未得到独立描述。我们旨在评估 ChatGPT-4 回答 USMLE 第一步、第二步 CK 和第三步练习题的能力。

方法

编制了 USMLE 第一步、第二步 CK 和第三步的练习题。在 376 个可用问题中，ChatGPT-4 在 2023 年 3 月 21 日分析了 319 个（85%）。我们的主要结果是 ChatGPT-4 在 USMLE 第一步、第二步 CK 和第三步练习考试中的表现，以正确回答多项选择题的比例来衡量。我们的次要结果是 ChatGPT-4 提供的问题和回答的平均长度。

结果

ChatGPT-4 回答了来自 USMLE 练习题的 319 个基于文本的多项选择题。ChatGPT-4 在 USMLE 第一步中正确回答了 93 个问题中的 82 个（88%），在第二步 CK 中正确回答了 106 个问题中的 91 个（86%），在第三步中正确回答了 120 个问题中的 108 个（90%）。ChatGPT-4 为所有问题都提供了解释。ChatGPT-4 回答 USMLE 第一步练习题的平均用时为 30.8±11.8 秒，第二步 CK 为 23.0±9.4 秒，第三步为 23.1±8.3 秒。ChatGPT-4 正确回答和错误回答的练习题 USMLE 多项选择题的平均长度相似（差异=17.48 个字符，SE=59.75，95%CI= [-100.09,135.04]， =0.29， =0.77）。ChatGPT-4 对练习题正确回答的平均长度明显短于错误回答的平均长度（差异=79.58 个字符，SE=35.42，95%CI= [9.89,149.28]， =2.25， =0.03）。

结论

ChatGPT-4 对 USMLE 考试的练习题回答了很高比例的正确答案。ChatGPT-4 在 USMLE 练习题上的表现明显优于同一人工智能聊天机器人的先前模型。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT-4：美国医师执照考试中人工智能聊天机器人的升级评估。

ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.

机构信息

出版信息

PURPOSE

METHOD

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献

ChatGPT-4：美国医师执照考试中人工智能聊天机器人的升级评估。

ChatGPT-4: An assessment of an upgraded artificial intelligence chatbot in the United States Medical Licensing Examination.

机构信息

出版信息

PURPOSE

METHOD

RESULTS

CONCLUSIONS

目的

方法

结果

结论

相似文献

引用本文的文献