ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.

作者信息

Ghanem Diane, Covarrubias Oscar, Raad Micheal, LaPorte Dawn, Shafiq Babar

机构信息

Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland.

School of Medicine, The Johns Hopkins University, Baltimore, Maryland.

出版信息

JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.

DOI:10.2106/JBJS.OA.23.00103

PMID:38638869

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11025881/

Abstract

INTRODUCTION

Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE).

METHODS

All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year.

RESULTS

A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3.

DISCUSSION

ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.

摘要

引言

诸如ChatGPT之类的公开可用人工智能语言模型在提供明确指令时已在文本生成甚至问题解决方面展现出实用性。在这一变革性转变中，本研究的目的是评估ChatGPT在骨科住院医师培训考试（OITE）中的表现。

方法

从美国骨科医师学会研究学习网站（https://www.aaos.org/education/examinations/ResStudy）检索了2021年所有213道基于网络的OITE问题。两名独立评审员将问题及答案选项复制粘贴到ChatGPT Plus（版本4.0）中，并记录生成的答案。所有包含媒体内容的问题都被标记并仔细审查。排除了12道纯粹依赖图像（临床图片、X光片、核磁共振成像、CT扫描）且无法从临床表现中进行推理的含媒体内容的OITE问题。使用科恩卡方系数来检验评审员之间ChatGPT生成答案的一致性。描述性统计用于总结ChatGPT Plus的表现（正确百分比）。使用2021年常模表将ChatGPT Plus在OITE上的表现与同年全国骨科住院医师的表现进行比较。

结果

ChatGPT Plus共评估了201道问题。评审员对ChatGPT生成的201个答案的一致性极佳，科恩卡方系数为0.947。45.8%（92/201）是含媒体内容的问题。ChatGPT的平均总分为61.2%（123/201）。其在非媒体问题上的得分是64.2%（70/109）。与2021年所有全国骨科住院医师的表现相比，ChatGPT Plus的表现处于平均PGY3水平。

讨论

ChatGPT Plus能够以61.2%的总分通过OITE，排名处于骨科三年级住院医师水平。它提供了逻辑推理和理由，可能有助于住院医师提高对OITE病例和一般骨科原则的理解。仍需进一步研究来检验它们对长期学习以及OITE/美国骨科医师委员会考试表现的有效性和影响。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献

ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

引言

方法

结果

讨论

相似文献

引用本文的文献

本文引用的文献