Mendiratta Dhruv, Herzog Isabel, Singh Rohan, Para Ashok, Joshi Tej, Vosbikian Michael, Kaushal Neil
Department of Orthopaedic Surgery Rutgers New Jersey Medical School Newark New Jersey USA.
J Exp Orthop. 2025 Jan 2;12(1):e70135. doi: 10.1002/jeo2.70135. eCollection 2025 Jan.
Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.
Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.
ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning ( < 0.001), internal information ( = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present ( = 0.320).
ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.
Level V.
聊天生成预训练变换器(ChatGPT)可能作为一种新型教育资源具有重要意义。随着每年信息的变化,对于骨科住院医师培训考试(OITE)的最佳资源存在不同意见。本研究评估ChatGPT在OITE上的表现,以作为住院医师潜在的学习资源。
OITE数据集的问题来自美国骨科医师学会(AAOS)网站。纳入了2022年OITE的所有问题。所有问题,包括带有图像的问题,都纳入分析。问题的格式与AAOS网站上呈现的方式相同,问题、叙述文本和答案选项用一条线隔开。每个问题在新的聊天会话中进行评估,以尽量减少混杂变量。ChatGPT的答案根据是否包含逻辑、内部或外部信息进行特征描述。错误回答进一步分为逻辑、信息或明显谬误。
基于2022年AAOS的OITE,ChatGPT的总体成功率为48.3%。ChatGPT在67.6%的问题中表现出应用逻辑和逐步思考的能力。ChatGPT在68.1%的问题中有效利用了题干中的内部信息。ChatGPT在68.1%的问题中也表现出纳入外部信息的能力。正确回答中逻辑推理(<0.001)、内部信息(=0.004)和外部信息(p=0.009)的利用率高于错误回答。信息谬误是ChatGPT回答中最常见的缺点。基于是否有图像,正确回答没有差异(=0.320)。
ChatGPT表现出逻辑、信息和明显的谬误,目前可能会导致错误信息并阻碍住院医师教育。
V级。