ChatGPT plus（版本4.0）与预训练人工智能模型（Orthopod）在骨科住院医师培训考试（OITE）中的比较。

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

作者信息

Magruder Matthew L, Miskiewicz Michael, Rodriguez Ariel N, Ng Mitchell, Abdelgawad Amr

机构信息

Department of Orthopaedic Surgery, Maimonides Medical Center, Brooklyn, NY, USA.

Renaissance School of Medicine at Stony Brook University Medical Center, Stony Brook, NY, USA.

出版信息

Surgeon. 2025 Jun;23(3):187-191. doi: 10.1016/j.surge.2025.04.004. Epub 2025 Apr 22.

DOI:10.1016/j.surge.2025.04.004

PMID:40263060

Abstract

INTRODUCTION

Recent advancements in large language model (LLM) artificial intelligence (AI) systems, like ChatGPT, have showcased ability in answering standardized examination questions, but their performance is variable. The goal of this study was to compare the performance of standard ChatGPT-4 with a custom-trained ChatGPT model taking the Orthopaedic Surgery In-Training Examination (OITE).

METHODS

Practice questions for the 2022 OITE, made available on the AAOS-ResStudy website (aaos.org/education/examinations/ResStudy), were used for this study. Question stems were uploaded to both standard ChatGPT-4 and the custom-trained ChatGPT model (Orthopod), and the responses were documented as correct or incorrect. For questions containing media elements, screenshots were converted to PNG files and uploaded to ChatGPT. Evaluation of the AI's performance included descriptive statistics to determine the percent of questions answered correctly or incorrectly.

RESULTS

Two-hundred and seven questions were analyzed with both ChatGPT 4.0 and Orthopod. ChatGPT correctly answered 73.43 % (152/207) of the questions, while Orthopod correctly answered 71.01 % (147/207) of the questions. There was no significant difference in performance of either language model based on inclusion of media or question category.

CONCLUSION

ChatGPT 4.0 and Orthopod correctly answered 73.43 % and 71.01 % of OITE practice questions correctly. Both systems provided well-reasoned answers in response to multiple choice questions. The thoughtfully articulated responses and well-supported explanations offered by both systems may prove to be a valuable educational resource for orthopedic residents as they prepare for upcoming board-style exams.

LEVEL OF EVIDENCE

IV.

摘要

引言

大型语言模型（LLM）人工智能（AI）系统（如ChatGPT）的最新进展已展示出回答标准化考试问题的能力，但其表现参差不齐。本研究的目的是比较标准ChatGPT-4与经过定制训练的ChatGPT模型在参加骨外科住院医师培训考试（OITE）时的表现。

方法

本研究使用了美国骨科学会（AAOS）ResStudy网站（aaos.org/education/examinations/ResStudy）上提供的2022年OITE练习题。问题题干被上传至标准ChatGPT-4和经过定制训练的ChatGPT模型（Orthopod），其回答被记录为正确或错误。对于包含媒体元素的问题，截图被转换为PNG文件并上传至ChatGPT。对AI表现的评估包括描述性统计，以确定回答正确或错误的问题百分比。

结果

使用ChatGPT 4.0和Orthopod对207道问题进行了分析。ChatGPT正确回答了73.43%（152/207）的问题，而Orthopod正确回答了71.01%（147/207）的问题。基于是否包含媒体或问题类别，两种语言模型的表现均无显著差异。

结论

ChatGPT 4.0和Orthopod分别正确回答了73.43%和71.01%的OITE练习题。两个系统针对多项选择题都给出了经过充分推理的答案。两个系统给出的条理清晰的回答和有充分依据的解释，可能会成为骨科住院医师备考即将到来的委员会式考试时的宝贵教育资源。

证据级别

四级

相似文献

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

Surgeon. 2025 Jun;23(3):187-191. doi: 10.1016/j.surge.2025.04.004. Epub 2025 Apr 22.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.

Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

Orthopedics. 2024 May-Jun;47(3):e146-e150. doi: 10.3928/01477447-20240304-02. Epub 2024 Mar 12.

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.

JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.

Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions.

J Hand Surg Glob Online. 2023 Dec 11;6(2):164-168. doi: 10.1016/j.jhsg.2023.10.013. eCollection 2024 Mar.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.

Cureus. 2024 Jun 18;16(6):e62643. doi: 10.7759/cureus.62643. eCollection 2024 Jun.

Can generative artificial intelligence pass the orthopaedic board examination?

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT plus（版本4.0）与预训练人工智能模型（Orthopod）在骨科住院医师培训考试（OITE）中的比较。

Comparison of ChatGPT plus (version 4.0) and pretrained AI model (Orthopod) on orthopaedic in-training exam (OITE).

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSION

LEVEL OF EVIDENCE

引言

方法

结果

结论

证据级别

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献