• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。

ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.

作者信息

Ghanem Diane, Covarrubias Oscar, Raad Micheal, LaPorte Dawn, Shafiq Babar

机构信息

Department of Orthopaedic Surgery, The Johns Hopkins Hospital, Baltimore, Maryland.

School of Medicine, The Johns Hopkins University, Baltimore, Maryland.

出版信息

JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.

DOI:10.2106/JBJS.OA.23.00103
PMID:38638869
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11025881/
Abstract

INTRODUCTION

Publicly available AI language models such as ChatGPT have demonstrated utility in text generation and even problem-solving when provided with clear instructions. Amidst this transformative shift, the aim of this study is to assess ChatGPT's performance on the orthopaedic surgery in-training examination (OITE).

METHODS

All 213 OITE 2021 web-based questions were retrieved from the AAOS-ResStudy website (https://www.aaos.org/education/examinations/ResStudy). Two independent reviewers copied and pasted the questions and response options into ChatGPT Plus (version 4.0) and recorded the generated answers. All media-containing questions were flagged and carefully examined. Twelve OITE media-containing questions that relied purely on images (clinical pictures, radiographs, MRIs, CT scans) and could not be rationalized from the clinical presentation were excluded. Cohen's Kappa coefficient was used to examine the agreement of ChatGPT-generated responses between reviewers. Descriptive statistics were used to summarize the performance (% correct) of ChatGPT Plus. The 2021 norm table was used to compare ChatGPT Plus' performance on the OITE to national orthopaedic surgery residents in that same year.

RESULTS

A total of 201 questions were evaluated by ChatGPT Plus. Excellent agreement was observed between raters for the 201 ChatGPT-generated responses, with a Cohen's Kappa coefficient of 0.947. 45.8% (92/201) were media-containing questions. ChatGPT had an average overall score of 61.2% (123/201). Its score was 64.2% (70/109) on non-media questions. When compared to the performance of all national orthopaedic surgery residents in 2021, ChatGPT Plus performed at the level of an average PGY3.

DISCUSSION

ChatGPT Plus is able to pass the OITE with an overall score of 61.2%, ranking at the level of a third-year orthopaedic surgery resident. It provided logical reasoning and justifications that may help residents improve their understanding of OITE cases and general orthopaedic principles. Further studies are still needed to examine their efficacy and impact on long-term learning and OITE/ABOS performance.

摘要

引言

诸如ChatGPT之类的公开可用人工智能语言模型在提供明确指令时已在文本生成甚至问题解决方面展现出实用性。在这一变革性转变中,本研究的目的是评估ChatGPT在骨科住院医师培训考试(OITE)中的表现。

方法

从美国骨科医师学会研究学习网站(https://www.aaos.org/education/examinations/ResStudy)检索了2021年所有213道基于网络的OITE问题。两名独立评审员将问题及答案选项复制粘贴到ChatGPT Plus(版本4.0)中,并记录生成的答案。所有包含媒体内容的问题都被标记并仔细审查。排除了12道纯粹依赖图像(临床图片、X光片、核磁共振成像、CT扫描)且无法从临床表现中进行推理的含媒体内容的OITE问题。使用科恩卡方系数来检验评审员之间ChatGPT生成答案的一致性。描述性统计用于总结ChatGPT Plus的表现(正确百分比)。使用2021年常模表将ChatGPT Plus在OITE上的表现与同年全国骨科住院医师的表现进行比较。

结果

ChatGPT Plus共评估了201道问题。评审员对ChatGPT生成的201个答案的一致性极佳,科恩卡方系数为0.947。45.8%(92/201)是含媒体内容的问题。ChatGPT的平均总分为61.2%(123/201)。其在非媒体问题上的得分是64.2%(70/109)。与2021年所有全国骨科住院医师的表现相比,ChatGPT Plus的表现处于平均PGY3水平。

讨论

ChatGPT Plus能够以61.2%的总分通过OITE,排名处于骨科三年级住院医师水平。它提供了逻辑推理和理由,可能有助于住院医师提高对OITE病例和一般骨科原则的理解。仍需进一步研究来检验它们对长期学习以及OITE/美国骨科医师委员会考试表现的有效性和影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/40419d900555/jbjsoa-8-e23.00103-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/45f49aa37d6a/jbjsoa-8-e23.00103-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/746fe96a631c/jbjsoa-8-e23.00103-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/af366a3a13cc/jbjsoa-8-e23.00103-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/40419d900555/jbjsoa-8-e23.00103-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/45f49aa37d6a/jbjsoa-8-e23.00103-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/746fe96a631c/jbjsoa-8-e23.00103-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/af366a3a13cc/jbjsoa-8-e23.00103-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c471/11025881/40419d900555/jbjsoa-8-e23.00103-g004.jpg

相似文献

1
ChatGPT Performs at the Level of a Third-Year Orthopaedic Surgery Resident on the Orthopaedic In-Training Examination.ChatGPT在骨科住院医师培训考试中的表现相当于一名三年级骨科住院医师的水平。
JB JS Open Access. 2023 Dec 11;8(4). doi: 10.2106/JBJS.OA.23.00103. eCollection 2023 Oct-Dec.
2
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。
J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.
3
Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗?
J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.
4
Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions.人工智能与住院医师在上肢骨科培训考试问题上表现的比较。
J Hand Surg Glob Online. 2023 Dec 11;6(2):164-168. doi: 10.1016/j.jhsg.2023.10.013. eCollection 2024 Mar.
5
ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.ChatGPT、Bard和必应聊天是大型语言处理模型,它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。
Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.
6
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.ChatGPT-3.5、ChatGPT-4 和骨科住院医师在骨科评估考试中的表现比较。
J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.
7
Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination.评估ChatGPT在骨科住院医师培训考试中的表现。
JB JS Open Access. 2023 Sep 8;8(3). doi: 10.2106/JBJS.OA.23.00056. eCollection 2023 Jul-Sep.
8
Evaluating ChatGPT's Capabilities on Orthopedic Training Examinations: An Analysis of New Image Processing Features.评估ChatGPT在骨科训练考试中的能力:对新图像处理功能的分析
Cureus. 2024 Mar 11;16(3):e55945. doi: 10.7759/cureus.55945. eCollection 2024 Mar.
9
ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。
Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.
10
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

引用本文的文献

1
Systematic Review on Large Language Models in Orthopaedic Surgery.骨科手术中大型语言模型的系统评价
J Clin Med. 2025 Aug 20;14(16):5876. doi: 10.3390/jcm14165876.
2
Evaluating retrieval augmented generation and ChatGPT's accuracy on orthopaedic examination assessment questions.评估检索增强生成技术及ChatGPT在骨科检查评估问题上的准确性。
Ann Jt. 2025 Apr 22;10:12. doi: 10.21037/aoj-24-49. eCollection 2025.
3
An Assessment of the Performance of Different Chatbots on Shoulder and Elbow Questions.不同聊天机器人在肩部和肘部问题上的性能评估。

本文引用的文献

1
ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.ChatGPT 和大型语言模型的兴起:公共卫生领域新的 AI 驱动的信息疫情威胁。
Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023.
2
ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination.ChatGPT 相当于第一年整形外科住院医师:ChatGPT 在整形外科住院医师年度考核中的评估。
Aesthet Surg J. 2023 Nov 16;43(12):NP1085-NP1089. doi: 10.1093/asj/sjad130.
3
What's Important: The Next Academic-ChatGPT AI?
J Clin Med. 2025 Mar 27;14(7):2289. doi: 10.3390/jcm14072289.
4
Generative pre-trained transformer 4o (GPT-4o) in solving text-based multiple response questions for European Diploma in Radiology (EDiR): a comparative study with radiologists.生成式预训练变换器4o(GPT-4o)用于解答欧洲放射学文凭(EDiR)基于文本的多项选择题:与放射科医生的对比研究
Insights Imaging. 2025 Mar 22;16(1):66. doi: 10.1186/s13244-025-01941-7.
5
Enhancing Access to Orthopedic Education: Exploring the Potential of Generative Artificial Intelligence (AI) in Improving Health Literacy on Rotator Cuff Injuries.加强骨科教育的可及性:探索生成式人工智能在提高肩袖损伤健康素养方面的潜力。
Cureus. 2024 Nov 1;16(11):e72833. doi: 10.7759/cureus.72833. eCollection 2024 Nov.
6
Examining the Role of Large Language Models in Orthopedics: Systematic Review.检查大型语言模型在骨科中的作用:系统评价。
J Med Internet Res. 2024 Nov 15;26:e59607. doi: 10.2196/59607.
7
Integrating artificial intelligence in orthopaedic care and surgery: the revolutionary role of ChatGPT, as written with ChatGPT.将人工智能整合到骨科护理与手术中:ChatGPT的变革性作用,本文由ChatGPT撰写。
Int J Surg. 2024 Dec 1;110(12):7593-7597. doi: 10.1097/JS9.0000000000002130.
8
ChatGPT-4 Knows Its A B C D E but Cannot Cite Its Source.ChatGPT-4 知道基础知识,但无法注明其来源。
JB JS Open Access. 2024 Sep 5;9(3). doi: 10.2106/JBJS.OA.24.00099. eCollection 2024 Jul-Sep.
重要的是什么:下一个学术版ChatGPT人工智能?
J Bone Joint Surg Am. 2023 Jun 7;105(11):893-895. doi: 10.2106/JBJS.23.00269. Epub 2023 Apr 21.
4
From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing.从人类写作到人工智能生成的文本:审视ChatGPT在学术写作中的前景与潜在威胁。
Biol Sport. 2023 Apr;40(2):615-622. doi: 10.5114/biolsport.2023.125623. Epub 2023 Mar 15.
5
ChatGPT - Reshaping medical education and clinical management.ChatGPT——重塑医学教育与临床管理。
Pak J Med Sci. 2023 Mar-Apr;39(2):605-607. doi: 10.12669/pjms.39.2.7653.
6
A deeper dive into ChatGPT: history, use and future perspectives for orthopaedic research.深入探究ChatGPT:骨科研究的历史、应用及未来展望
Knee Surg Sports Traumatol Arthrosc. 2023 Apr;31(4):1190-1192. doi: 10.1007/s00167-023-07372-5. Epub 2023 Mar 9.
7
Not the Last Word: ChatGPT Can't Perform Orthopaedic Surgery.并非定论:ChatGPT无法实施骨科手术。
Clin Orthop Relat Res. 2023 Apr 1;481(4):651-655. doi: 10.1097/CORR.0000000000002619. Epub 2023 Mar 3.
8
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.ChatGPT在美国医师执照考试中的表现:使用大语言模型进行人工智能辅助医学教育的潜力。
PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.
9
ChatGPT: friend or foe?ChatGPT:朋友还是敌人?
Lancet Digit Health. 2023 Mar;5(3):e102. doi: 10.1016/S2589-7500(23)00023-7. Epub 2023 Feb 6.
10
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.