比较 ChatGPT 和 GPT-4 在 USMLE 软技能评估中的表现。

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

机构信息

Department of Diagnostic Imaging, Chaim Sheba Medical Center, Ramat Gan, Israel.

Faculty of Medicine, Tel-Aviv University, Tel-Aviv, Israel.

出版信息

Sci Rep. 2023 Oct 1;13(1):16492. doi: 10.1038/s41598-023-43436-9.

The United States Medical Licensing Examination (USMLE) has been a subject of performance study for artificial intelligence (AI) models. However, their performance on questions involving USMLE soft skills remains unexplored. This study aimed to evaluate ChatGPT and GPT-4 on USMLE questions involving communication skills, ethics, empathy, and professionalism. We used 80 USMLE-style questions involving soft skills, taken from the USMLE website and the AMBOSS question bank. A follow-up query was used to assess the models' consistency. The performance of the AI models was compared to that of previous AMBOSS users. GPT-4 outperformed ChatGPT, correctly answering 90% compared to ChatGPT's 62.5%. GPT-4 showed more confidence, not revising any responses, while ChatGPT modified its original answers 82.5% of the time. The performance of GPT-4 was higher than that of AMBOSS's past users. Both AI models, notably GPT-4, showed capacity for empathy, indicating AI's potential to meet the complex interpersonal, ethical, and professional demands intrinsic to the practice of medicine.

美国医师执照考试（USMLE）一直是人工智能（AI）模型的性能研究课题。然而，它们在涉及 USMLE 软技能的问题上的表现仍未得到探索。本研究旨在评估 ChatGPT 和 GPT-4 在涉及沟通技巧、伦理、同理心和专业精神的 USMLE 问题上的表现。我们使用了 80 个来自 USMLE 网站和 AMBOSS 题库的涉及软技能的 USMLE 风格问题。后续查询用于评估模型的一致性。AI 模型的性能与之前的 AMBOSS 用户进行了比较。GPT-4 的表现优于 ChatGPT，正确回答了 90%的问题，而 ChatGPT 的正确回答率为 62.5%。GPT-4 表现出更高的信心，没有修改任何回答，而 ChatGPT 有 82.5%的时间修改了其原始回答。GPT-4 的表现高于 AMBOSS 过去用户的表现。这两个 AI 模型，特别是 GPT-4，都表现出了同理心的能力，这表明 AI 有潜力满足医学实践中固有的复杂的人际、伦理和专业要求。

Comparing ChatGPT and GPT-4 performance in USMLE soft skill assessments.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献