• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

ChatGPT、Bard和必应聊天是大型语言处理模型,它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。

ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.

作者信息

Guerra Gage A, Hofmann Hayden L, Le Jonathan L, Wong Alexander M, Fathi Amir, Mayfield Cory K, Petrigliano Frank A, Liu Joseph N

机构信息

USC Epstein Family Center for Sports Medicine, Keck Medicine of USC, Los Angeles, California, U.S.A..

USC Epstein Family Center for Sports Medicine, Keck Medicine of USC, Los Angeles, California, U.S.A.

出版信息

Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.

DOI:10.1016/j.arthro.2024.08.023
PMID:39209078
Abstract

PURPOSE

To assess ChatGPT's, Bard's, and Bing Chat's ability to generate accurate orthopaedic diagnoses or corresponding treatments by comparing their performance on the Orthopaedic In-Training Examination (OITE) with that of orthopaedic trainees.

METHODS

OITE question sets from 2021 and 2022 were compiled to form a large set of 420 questions. ChatGPT (GPT-3.5), Bard, and Bing Chat were instructed to select one of the provided responses to each question. The accuracy of composite questions was recorded and comparatively analyzed to human cohorts including medical students and orthopaedic residents, stratified by postgraduate year (PGY).

RESULTS

ChatGPT correctly answered 46.3% of composite questions whereas Bing Chat correctly answered 52.4% of questions and Bard correctly answered 51.4% of questions on the OITE. When image-associated questions were excluded, ChatGPT's, Bing Chat's, and Bard's overall accuracies improved to 49.1%, 53.5%, and 56.8%, respectively. Medical students correctly answered 30.8%, and PGY-1, -2, -3, -4, and -5 orthopaedic residents correctly answered 53.1%, 60.4%, 66.6%, 70.0%, and 71.9%, respectively.

CONCLUSIONS

ChatGPT, Bard, and Bing Chat are artificial intelligence (AI) models that answered OITE questions with accuracy similar to that of first-year orthopaedic surgery residents. ChatGPT, Bard, and Bing Chat achieved this result without using images or other supplementary media that human test takers are provided.

CLINICAL RELEVANCE

Our comparative performance analysis of AI models on orthopaedic board-style questions highlights ChatGPT's, Bing Chat's, and Bard's clinical knowledge and proficiency. Our analysis establishes a baseline of AI model proficiency in the field of orthopaedics and provides a comparative marker for future, more advanced deep learning models. Although in its elementary phase, future AI models' orthopaedic knowledge may provide clinical support and serve as an educational tool.

摘要

目的

通过比较ChatGPT、Bard和必应聊天(Bing Chat)在骨科住院医师培训考试(OITE)中的表现与骨科住院医师的表现,评估它们生成准确骨科诊断或相应治疗方法的能力。

方法

收集2021年和2022年的OITE试题集,形成一套包含420个问题的大型题库。指导ChatGPT(GPT - 3.5)、Bard和必应聊天从每个问题提供的答案中选择一个。记录复合问题的准确率,并与包括医学生和骨科住院医师在内的人类群体进行比较分析,按研究生年级(PGY)分层。

结果

在OITE中,ChatGPT正确回答了46.3%的复合问题,必应聊天正确回答了52.4%的问题,Bard正确回答了51.4%的问题。排除与图像相关的问题后,ChatGPT、必应聊天和Bard的总体准确率分别提高到49.1%、53.5%和56.8%。医学生正确回答了30.8%,PGY - 1、- 2、- 3、- 4和- 5年级的骨科住院医师正确回答的比例分别为53.1%、60.4%、66.6%、70.0%和71.9%。

结论

ChatGPT、Bard和必应聊天是人工智能(AI)模型,其回答OITE问题的准确率与骨科手术一年级住院医师相似。ChatGPT、Bard和必应聊天在不使用人类考生所使用的图像或其他辅助媒介的情况下取得了这一结果。

临床意义

我们对AI模型在骨科板题型问题上的比较性能分析突出了ChatGPT、必应聊天和Bard的临床知识和熟练程度。我们的分析建立了AI模型在骨科领域熟练程度的基线,并为未来更先进的深度学习模型提供了一个比较指标。尽管处于初级阶段,但未来AI模型的骨科知识可能会提供临床支持并作为一种教育工具。

相似文献

1
ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.ChatGPT、Bard和必应聊天是大型语言处理模型,它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。
Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.
2
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。
J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.
3
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.
4
Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.人工智能驱动的聊天机器人在回答骨科研究生考试问题中的有效性——一项观察性研究。
Int Orthop. 2024 Aug;48(8):1963-1969. doi: 10.1007/s00264-024-06182-9. Epub 2024 Apr 15.
5
Comparitive performance of artificial intelligence-based large language models on the orthopedic in-training examination.基于人工智能的大语言模型在骨科住院医师培训考试中的比较表现。
J Orthop Surg (Hong Kong). 2025 Jan-Apr;33(1):10225536241268789. doi: 10.1177/10225536241268789.
6
Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.生成式人工智能的表现达到了骨科住院医师二年级的水平。
Cureus. 2024 Mar 13;16(3):e56104. doi: 10.7759/cureus.56104. eCollection 2024 Mar.
7
Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.两种人工智能生成语言模型在骨科住院医师考试中的表现。
Orthopedics. 2024 May-Jun;47(3):e146-e150. doi: 10.3928/01477447-20240304-02. Epub 2024 Mar 12.
8
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展:GPT-4 在骨科手术委员会问题上的表现。
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
9
Comparative Performance of ChatGPT and Bard in a Text-Based Radiology Knowledge Assessment.ChatGPT 和 Bard 在基于文本的放射学知识评估中的比较性能。
Can Assoc Radiol J. 2024 May;75(2):344-350. doi: 10.1177/08465371231193716. Epub 2023 Aug 14.
10
Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study.在医学视觉问答中评估Bard Gemini Pro和GPT-4 Vision对学生表现的影响:比较案例研究
JMIR Form Res. 2024 Dec 17;8:e57592. doi: 10.2196/57592.

引用本文的文献

1
ChatGPT-4 Responses on Ankle Cartilage Surgery Often Diverge from Expert Consensus: A Comparative Analysis.ChatGPT-4对踝关节软骨手术的回答往往与专家共识存在分歧:一项比较分析。
Foot Ankle Orthop. 2025 Aug 13;10(3):24730114251352494. doi: 10.1177/24730114251352494. eCollection 2025 Jul.
2
Performance of AI Models vs. Orthopedic Residents in Turkish Specialty Training Development Exams in Orthopedics.人工智能模型与土耳其骨科专科培训发展考试中骨科住院医师的表现对比。
Sisli Etfal Hastan Tip Bul. 2025 Feb 7;59(2):151-155. doi: 10.14744/SEMB.2025.65289. eCollection 2025.
3
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.
大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.