ChatGPT作为骨科住院医师培训考试备考工具的效用。

Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination.

作者信息

Mendiratta Dhruv, Herzog Isabel, Singh Rohan, Para Ashok, Joshi Tej, Vosbikian Michael, Kaushal Neil

机构信息

Department of Orthopaedic Surgery Rutgers New Jersey Medical School Newark New Jersey USA.

出版信息

J Exp Orthop. 2025 Jan 2;12(1):e70135. doi: 10.1002/jeo2.70135. eCollection 2025 Jan.

DOI:10.1002/jeo2.70135

PMID:39749288

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11693985/

Abstract

PURPOSE

Chat Generative Pre-Trained Transformer (ChatGPT) may have implications as a novel educational resource. There are differences in opinion on the best resource for the Orthopaedic In-Training Exam (OITE) as information changes from year to year. This study assesses ChatGPT's performance on the OITE for use as a potential study resource for residents.

METHODS

Questions for the OITE data set were sourced from the American Academy of Orthopaedic Surgeons (AAOS) website. All questions from the 2022 OITE were included. All questions, including those with images, were included in the analysis. The questions were formatted in the same manner as presented on the AAOS website, with the question, narrative text and answer choices separated by a line. Each question was evaluated in a new chat session to minimize confounding variables. Answers from ChatGPT were characterized by whether they contained logical, internal or external information. Incorrect responses were further categorized into logical, informational or explicit fallacies.

RESULTS

ChatGPT yielded an overall success rate of 48.3% based on the 2022 AAOS OITE. ChatGPT demonstrated the ability to apply logic and stepwise thinking in 67.6% of the questions. ChatGPT effectively utilized internal information from the question stem in 68.1% of the questions. ChatGPT also demonstrated the ability to incorporate external information in 68.1% of the questions. The utilization of logical reasoning ( < 0.001), internal information ( = 0.004) and external information (p = 0.009) was greater among correct responses than incorrect responses. Informational fallacy was the most common shortcoming of ChatGPT's responses. There was no difference in correct responses based on whether or not an image was present ( = 0.320).

CONCLUSIONS

ChatGPT demonstrates logical, informational and explicit fallacies which, at this time, may lead to misinformation and hinder resident education.

LEVEL OF EVIDENCE

Level V.

摘要

目的

聊天生成预训练变换器（ChatGPT）可能作为一种新型教育资源具有重要意义。随着每年信息的变化，对于骨科住院医师培训考试（OITE）的最佳资源存在不同意见。本研究评估ChatGPT在OITE上的表现，以作为住院医师潜在的学习资源。

方法

OITE数据集的问题来自美国骨科医师学会（AAOS）网站。纳入了2022年OITE的所有问题。所有问题，包括带有图像的问题，都纳入分析。问题的格式与AAOS网站上呈现的方式相同，问题、叙述文本和答案选项用一条线隔开。每个问题在新的聊天会话中进行评估，以尽量减少混杂变量。ChatGPT的答案根据是否包含逻辑、内部或外部信息进行特征描述。错误回答进一步分为逻辑、信息或明显谬误。

结果

基于2022年AAOS的OITE，ChatGPT的总体成功率为48.3%。ChatGPT在67.6%的问题中表现出应用逻辑和逐步思考的能力。ChatGPT在68.1%的问题中有效利用了题干中的内部信息。ChatGPT在68.1%的问题中也表现出纳入外部信息的能力。正确回答中逻辑推理（<0.001）、内部信息（=0.004）和外部信息（p=0.009）的利用率高于错误回答。信息谬误是ChatGPT回答中最常见的缺点。基于是否有图像，正确回答没有差异（=0.320）。

结论

ChatGPT表现出逻辑、信息和明显的谬误，目前可能会导致错误信息并阻碍住院医师教育。

证据水平

V级。

相似文献

Utility of ChatGPT as a preparation tool for the Orthopaedic In-Training Examination.ChatGPT作为骨科住院医师培训考试备考工具的效用。

J Exp Orthop. 2025 Jan 2;12(1):e70135. doi: 10.1002/jeo2.70135. eCollection 2025 Jan.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。

JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.

Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗？

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试（USMLE）中的表现如何？大语言模型对医学教育和知识评估的影响。

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).ChatGPT plus（版本4.0）与预训练人工智能模型（Orthopod）在骨科住院医师培训考试（OITE）中的比较。

Surgeon. 2025 Jun;23(3):187-191. doi: 10.1016/j.surge.2025.04.004. Epub 2025 Apr 22.

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.ChatGPT、Bard和必应聊天是大型语言处理模型，它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。

Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.ChatGPT在骨科委员会风格笔试中的表现不佳。

Cureus. 2024 Jun 18;16(6):e62643. doi: 10.7759/cureus.62643. eCollection 2024 Jun.

Assessing ChatGPT's orthopedic in-service training exam performance and applicability in the field.评估 ChatGPT 在骨科在职培训考试中的表现和在该领域的适用性。

J Orthop Surg Res. 2024 Jan 3;19(1):27. doi: 10.1186/s13018-023-04467-0.

Performance of ChatGPT on the Plastic Surgery Inservice Training Examination.ChatGPT 在整形外科学在职培训考试中的表现。

Aesthet Surg J. 2023 Nov 16;43(12):NP1078-NP1082. doi: 10.1093/asj/sjad128.

本文引用的文献

"Chatting with ChatGPT": Analyzing the factors influencing users' intention to Use the Open AI's ChatGPT using the UTAUT model.“与ChatGPT聊天”：运用UTAUT模型分析影响用户使用OpenAI的ChatGPT意愿的因素。

Heliyon. 2023 Oct 18;9(11):e20962. doi: 10.1016/j.heliyon.2023.e20962. eCollection 2023 Nov.

High Rates of Fabricated and Inaccurate References in ChatGPT-Generated Medical Content.ChatGPT生成的医学内容中虚假和不准确参考文献的高比例。

Cureus. 2023 May 19;15(5):e39238. doi: 10.7759/cureus.39238. eCollection 2023 May.

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.评估ChatGPT在眼科领域的表现：对其优缺点的分析。

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

Performance of ChatGPT, GPT-4, and Google Bard on a Neurosurgery Oral Boards Preparation Question Bank.ChatGPT、GPT-4和谷歌巴德在神经外科口试准备题库上的表现。

Neurosurgery. 2023 Nov 1;93(5):1090-1098. doi: 10.1227/neu.0000000000002551. Epub 2023 Jun 12.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

ChatGPT Performance on the American Urological Association Self-assessment Study Program and the Potential Influence of Artificial Intelligence in Urologic Training.ChatGPT 在泌尿外科协会自我评估研究计划中的表现以及人工智能在泌尿外科培训中的潜在影响。

Urology. 2023 Jul;177:29-33. doi: 10.1016/j.urology.2023.05.010. Epub 2023 May 18.

Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations.ChatGPT 在放射科 Board 考试中的表现：当前优势和局限性的深入了解。

Radiology. 2023 Jun;307(5):e230582. doi: 10.1148/radiol.230582. Epub 2023 May 16.

ChatGPT in anaesthesia research: risk of fabrication in literature searches.麻醉学研究中的ChatGPT：文献检索中存在编造的风险。

Br J Anaesth. 2023 Jul;131(1):e29-e30. doi: 10.1016/j.bja.2023.04.009. Epub 2023 May 12.

ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起：公共卫生领域新的 AI 驱动的信息疫情威胁。

Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.

ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination.ChatGPT 相当于第一年整形外科住院医师：ChatGPT 在整形外科住院医师年度考核中的评估。

Aesthet Surg J. 2023 Nov 16;43(12):NP1085-NP1089. doi: 10.1093/asj/sjad130.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验