人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

机构信息

Department of Orthopaedic Surgery, Geisinger Commonwealth School of Medicine, Geisinger Musculoskeletal Institute, Danville, PA.

出版信息

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

DOI:10.1016/j.jsurg.2024.08.002

PMID:39284250

Abstract

OBJECTIVE

Artificial intelligence (AI) is capable of answering complex medical examination questions, offering the potential to revolutionize medical education and healthcare delivery. In this study we aimed to assess ChatGPT, a model that has demonstrated exceptional performance on standardized exams. Specifically, our focus was on evaluating ChatGPT's performance on the complete 2019 Orthopaedic In-Training Examination (OITE), including questions with an image component. Furthermore, we explored difference in performance when questions varied by text only or text with an associated image, including whether the image was described using AI or a trained orthopaedist.

DESIGN AND SETTING

Questions from the 2019 OITE were input into ChatGPT version 4.0 (GPT-4) using 3 response variants. As the capacity to input or interpret images is not publicly available in ChatGPT at the time of this study, questions with an image component were described and added to the OITE question using descriptions generated by Microsoft Azure AI Vision Studio or authors of the study.

RESULTS

ChatGPT performed equally on OITE questions with or without imaging components, with an average correct answer choice of 49% and 48% across all 3 input methods. Performance dropped by 6% when using image descriptions generated by AI. When using single answer multiple-choice input methods, ChatGPT performed nearly double the rate of random guessing, answering 49% of questions correctly. The performance of ChatGPT was worse than all resident classes on the 2019 exam, scoring 4% lower than PGY-1 residents.

DISCUSSION

ChatGT performed below all resident classes on the 2019 OITE. Performance on text only questions and questions with images was nearly equal if the image was described by a trained orthopaedic specialist but decreased when using an AI generated description. Recognizing the performance abilities of AI software may provide insight into the current and future applications of this technology into medical education.

摘要

目的

人工智能（AI）能够回答复杂的医学检查问题，有可能彻底改变医学教育和医疗保健的提供方式。在这项研究中，我们旨在评估 ChatGPT，这是一种在标准化考试中表现出色的模型。具体来说，我们的重点是评估 ChatGPT 在完整的 2019 年骨科住院医师培训考试（OITE）中的表现，包括带有图像组件的问题。此外，我们探讨了当问题仅以文本或带有相关图像的文本呈现时，性能的差异，包括图像是使用 AI 还是经过培训的骨科医生描述的。

设计和设置

将 2019 年 OITE 的问题输入到 ChatGPT 版本 4.0（GPT-4）中，使用 3 种响应变体。由于在本研究时，ChatGPT 没有公开输入或解释图像的能力，因此带有图像组件的问题使用由 Microsoft Azure AI Vision Studio 或研究作者生成的描述添加到 OITE 问题中。

结果

ChatGPT 在带有或不带有成像组件的 OITE 问题上表现相同，在所有 3 种输入方法中，平均正确答案选择率为 49%和 48%。使用 AI 生成的图像描述时，性能下降了 6%。当使用单答案多项选择输入方法时，ChatGPT 的回答正确的问题率几乎是随机猜测的两倍，正确回答了 49%的问题。ChatGPT 在 2019 年考试中的表现低于所有住院医师类别，比 PGY-1 住院医师得分低 4%。

讨论

ChatGPT 在 2019 年 OITE 上的表现低于所有住院医师类别。如果图像由经过培训的骨科专家描述，那么仅文本问题和带有图像的问题的表现几乎相等，但如果使用 AI 生成的描述，则性能会下降。认识到 AI 软件的性能能力可能会深入了解这项技术在医学教育中的当前和未来应用。

相似文献

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗？

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较：横断面问卷调查研究

JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施：系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Surgeons vs ChatGPT: Assessment and Feedback Performance Based on Real Surgical Scenarios.外科医生与 ChatGPT：基于真实手术场景的评估和反馈表现。

J Surg Educ. 2024 Jul;81(7):960-966. doi: 10.1016/j.jsurg.2024.03.012. Epub 2024 May 14.

AI-based Hepatic Steatosis Detection and Integrated Hepatic Assessment from Cardiac CT Attenuation Scans Enhances All-cause Mortality Risk Stratification: A Multi-center Study.基于人工智能的心脏CT衰减扫描检测肝脂肪变性及综合肝脏评估可增强全因死亡风险分层：一项多中心研究

medRxiv. 2025 Jun 11:2025.06.09.25329157. doi: 10.1101/2025.06.09.25329157.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.外周动脉疾病教育中的人工智能：ChatGPT与谷歌Gemini的较量

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Artificial intelligence for detecting keratoconus.人工智能在圆锥角膜检测中的应用。

Cochrane Database Syst Rev. 2023 Nov 15;11(11):CD014911. doi: 10.1002/14651858.CD014911.pub2.

引用本文的文献

ChatGPT-4o is Not a Reliable Study Source for Orthopaedic Surgery Residents.ChatGPT-4o并非骨科住院医师可靠的学习资源。

JB JS Open Access. 2025 Sep 11;10(3). doi: 10.2106/JBJS.OA.25.00112. eCollection 2025 Jul-Sep.

Correspondence on "exploring the role of artificial intelligence in turkish orthopedic progression exams".关于“探索人工智能在土耳其骨科进展考试中的作用”的通信

Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):230-231. doi: 10.5152/j.aott.2025.25418.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

机构信息

出版信息

OBJECTIVE

DESIGN AND SETTING

RESULTS

DISCUSSION

目的

设计和设置

结果

讨论

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献