• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

两种人工智能生成语言模型在骨科住院医师考试中的表现。

Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.

出版信息

Orthopedics. 2024 May-Jun;47(3):e146-e150. doi: 10.3928/01477447-20240304-02. Epub 2024 Mar 12.

DOI:10.3928/01477447-20240304-02
PMID:38466827
Abstract

BACKGROUND

Artificial intelligence (AI) generative large language models are powerful and increasingly accessible tools with potential applications in health care education and training. The annual Orthopaedic In-Training Examination (OITE) is widely used to assess resident academic progress and preparation for the American Board of Orthopaedic Surgery Part 1 Examination.

MATERIALS AND METHODS

Open AI's ChatGPT and Google's Bard generative language models were administered the 2022 OITE. Question stems that contained images were input without and then with a text-based description of the imaging findings.

RESULTS

ChatGPT answered 69.1% of questions correctly. When provided with text describing accompanying media, this increased to 77.8% correct. In contrast, Bard answered 49.8% of questions correctly. This increased to 58% correct when text describing imaging in question stems was provided (<.0001). ChatGPT was most accurate in questions within the shoulder category, with 90.9% correct. Bard performed best in the sports category, with 65.4% correct. ChatGPT performed above the published mean of Accreditation Council for Graduate Medical Education orthopedic resident test-takers (66%).

CONCLUSION

There is significant variability in the accuracy of publicly available AI models on the OITE. AI generative language software may play numerous potential roles in the future in orthopedic education, including simulating patient presentations and clinical scenarios, customizing individual learning plans, and driving evidence-based case discussion. Further research and collaboration within the orthopedic community is required to safely adopt these tools and minimize risks associated with their use. [. 2024;47(3):e146-e150.].

摘要

背景

人工智能(AI)生成型大语言模型是功能强大且日益普及的工具,在医疗保健教育和培训领域具有潜在应用。年度骨科住院医师培训考试(OITE)广泛用于评估住院医师的学术进展和为美国骨科委员会第 1 部分考试做准备。

材料和方法

Open AI 的 ChatGPT 和 Google 的 Bard 生成语言模型参加了 2022 年的 OITE。包含图像的问题题干被输入,然后输入关于影像学发现的基于文本的描述。

结果

ChatGPT 正确回答了 69.1%的问题。当提供描述伴随媒体的文本时,正确回答率增加到 77.8%。相比之下,Bard 正确回答了 49.8%的问题。当提供问题题干中影像学描述的文本时,正确回答率增加到 58%(<.0001)。ChatGPT 在肩部类问题中准确率最高,正确回答率为 90.9%。Bard 在运动类问题中表现最好,正确回答率为 65.4%。ChatGPT 的表现优于已公布的骨科住院医师考试(66%)。

结论

在 OITE 上,现有的 AI 模型的准确率存在显著差异。AI 生成语言软件在未来的骨科教育中可能具有多种潜在作用,包括模拟患者表现和临床场景、定制个性化学习计划以及推动基于证据的病例讨论。需要在骨科社区内进行进一步的研究和合作,以安全采用这些工具并降低与使用相关的风险。[2024;47(3):e146-e150。]。

相似文献

1
Performance of Two Artificial Intelligence Generative Language Models on the Orthopaedic In-Training Examination.两种人工智能生成语言模型在骨科住院医师考试中的表现。
Orthopedics. 2024 May-Jun;47(3):e146-e150. doi: 10.3928/01477447-20240304-02. Epub 2024 Mar 12.
2
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用:ChatGPT 在 AAOS 骨科住院医师培训考试(OITE)全题文本和图像问题上的表现。
J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.
3
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗?骨科住院医师与ChatGPT的对比。
Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.
4
Generative Artificial Intelligence Performs at a Second-Year Orthopedic Resident Level.生成式人工智能的表现达到了骨科住院医师二年级的水平。
Cureus. 2024 Mar 13;16(3):e56104. doi: 10.7759/cureus.56104. eCollection 2024 Mar.
5
ChatGPT, Bard, and Bing Chat Are Large Language Processing Models That Answered Orthopaedic In-Training Examination Questions With Similar Accuracy to First-Year Orthopaedic Surgery Residents.ChatGPT、Bard和必应聊天是大型语言处理模型,它们回答骨科住院医师培训考试问题的准确率与骨科外科一年级住院医师相似。
Arthroscopy. 2025 Mar;41(3):557-562. doi: 10.1016/j.arthro.2024.08.023. Epub 2024 Aug 28.
6
Comparison of Artificial Intelligence to Resident Performance on Upper-Extremity Orthopaedic In-Training Examination Questions.人工智能与住院医师在上肢骨科培训考试问题上表现的比较。
J Hand Surg Glob Online. 2023 Dec 11;6(2):164-168. doi: 10.1016/j.jhsg.2023.10.013. eCollection 2024 Mar.
7
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展:GPT-4 在骨科手术委员会问题上的表现。
Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.
8
Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.人工智能驱动的聊天机器人在回答骨科研究生考试问题中的有效性——一项观察性研究。
Int Orthop. 2024 Aug;48(8):1963-1969. doi: 10.1007/s00264-024-06182-9. Epub 2024 Apr 15.
9
Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗?
J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.
10
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.ChatGPT-3.5、ChatGPT-4 和骨科住院医师在骨科评估考试中的表现比较。
J Am Acad Orthop Surg. 2023 Dec 1;31(23):1173-1179. doi: 10.5435/JAAOS-D-23-00396. Epub 2023 Sep 4.

引用本文的文献

1
ChatGPT-4o is Not a Reliable Study Source for Orthopaedic Surgery Residents.ChatGPT-4o并非骨科住院医师可靠的学习资源。
JB JS Open Access. 2025 Sep 11;10(3). doi: 10.2106/JBJS.OA.25.00112. eCollection 2025 Jul-Sep.
2
Systematic Review on Large Language Models in Orthopaedic Surgery.骨科手术中大型语言模型的系统评价
J Clin Med. 2025 Aug 20;14(16):5876. doi: 10.3390/jcm14165876.
3
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
4
Enhancing Patient Education on Cardiovascular Rehabilitation with Large Language Models.利用大语言模型加强心血管康复患者教育
Mo Med. 2025 Jan-Feb;122(1):67-71.
5
Exploring prospects, hurdles, and road ahead for generative artificial intelligence in orthopedic education and training.探索生成式人工智能在骨科教育与培训中的前景、障碍及未来之路。
BMC Med Educ. 2024 Dec 28;24(1):1544. doi: 10.1186/s12909-024-06592-8.
6
Unravelling Orthopaedic Surgeons' Perceptions and Adoption of Generative AI Technologies.剖析骨科医生对生成式人工智能技术的认知与应用情况
J CME. 2024 Dec 9;13(1):2437330. doi: 10.1080/28338073.2024.2437330. eCollection 2024.
7
Chatbot Demonstrates Moderate Interrater Reliability in Billing for Hand Surgery Clinic Encounters.聊天机器人在手外科门诊计费方面显示出中等程度的评分者间信度。
Hand (N Y). 2024 Nov 16:15589447241295328. doi: 10.1177/15589447241295328.