Suppr超能文献

ChatGPT在急诊医学委员会考试问题上的表现评估:观察性研究。

Evaluation of ChatGPT Performance on Emergency Medicine Board Examination Questions: Observational Study.

作者信息

Pastrak Mila, Kajitani Sten, Goodings Anthony James, Drewek Austin, LaFree Andrew, Murphy Adrian

机构信息

School of Medicine, University College Cork, Cork, Ireland.

Department of Emergency Medicine, Johns Hopkins University, Baltimore, MD, United States.

出版信息

JMIR AI. 2025 Mar 12;4:e67696. doi: 10.2196/67696.

Abstract

BACKGROUND

The ever-evolving field of medicine has highlighted the potential for ChatGPT as an assistive platform. However, its use in medical board examination preparation and completion remains unclear.

OBJECTIVE

This study aimed to evaluate the performance of a custom-modified version of ChatGPT-4, tailored with emergency medicine board examination preparatory materials (Anki flashcard deck), compared to its default version and previous iteration (3.5). The goal was to assess the accuracy of ChatGPT-4 answering board-style questions and its suitability as a tool to aid students and trainees in standardized examination preparation.

METHODS

A comparative analysis was conducted using a random selection of 598 questions from the Rosh In-Training Examination Question Bank. The subjects of the study included three versions of ChatGPT: the Default, a Custom, and ChatGPT-3.5. The accuracy, response length, medical discipline subgroups, and underlying causes of error were analyzed.

RESULTS

The Custom version did not demonstrate a significant improvement in accuracy over the Default version (P=.61), although both significantly outperformed ChatGPT-3.5 (P<.001). The Default version produced significantly longer responses than the Custom version, with the mean (SD) values being 1371 (444) and 929 (408), respectively (P<.001). Subgroup analysis revealed no significant difference in the performance across different medical subdisciplines between the versions (P>.05 in all cases). Both the versions of ChatGPT-4 had similar underlying error types (P>.05 in all cases) and had a 99% predicted probability of passing while ChatGPT-3.5 had an 85% probability.

CONCLUSIONS

The findings suggest that while newer versions of ChatGPT exhibit improved performance in emergency medicine board examination preparation, specific enhancement with a comprehensive Anki flashcard deck on the topic does not significantly impact accuracy. The study highlights the potential of ChatGPT-4 as a tool for medical education, capable of providing accurate support across a wide range of topics in emergency medicine in its default form.

摘要

背景

医学领域的不断发展凸显了ChatGPT作为辅助平台的潜力。然而,其在医学委员会考试准备和完成过程中的应用仍不明确。

目的

本研究旨在评估定制修改版ChatGPT-4的性能,该版本使用急诊医学委员会考试备考材料(Anki抽认卡组)进行定制,与默认版本及其前一版本(3.5)进行比较。目标是评估ChatGPT-4回答委员会风格问题的准确性及其作为帮助学生和学员进行标准化考试准备工具的适用性。

方法

使用从Rosh在职培训考试题库中随机选择的598个问题进行比较分析。研究对象包括ChatGPT的三个版本:默认版、定制版和ChatGPT-3.5。分析了准确性、回答长度、医学学科亚组以及错误的根本原因。

结果

定制版在准确性上并未显示出比默认版有显著提高(P = 0.61),尽管两者均显著优于ChatGPT-3.5(P < 0.001)。默认版生成的回答明显比定制版长,平均值(标准差)分别为1371(444)和929(408)(P < 0.001)。亚组分析显示,各版本在不同医学亚学科的表现上没有显著差异(所有情况P > 0.05)。ChatGPT-4的两个版本具有相似的根本错误类型(所有情况P > 0.05),通过考试的预测概率为99%,而ChatGPT-3.5的概率为85%。

结论

研究结果表明,虽然ChatGPT的新版本在急诊医学委员会考试准备中表现有所提高,但使用关于该主题的综合Anki抽认卡组进行特定增强并不会显著影响准确性。该研究凸显了ChatGPT-4作为医学教育工具的潜力,其默认形式能够在急诊医学的广泛主题上提供准确支持。

相似文献

本文引用的文献

9
A Radiation Oncology Board Exam of ChatGPT.ChatGPT的放射肿瘤学委员会考试。
Cureus. 2023 Sep 1;15(9):e44541. doi: 10.7759/cureus.44541. eCollection 2023 Sep.
10
A call for spaced repetition in medical education.医学教育中对间隔重复学习法的呼吁。
Clin Teach. 2024 Feb;21(1):e13669. doi: 10.1111/tct.13669. Epub 2023 Oct 3.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验