文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

人工智能的快速发展:GPT-4 在骨科手术委员会问题上的表现。

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

出版信息

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.


DOI:10.3928/01477447-20230922-05
PMID:37757748
Abstract

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [. 2024;47(2):e85-e89.].

摘要

人工智能和机器学习模型的进步,如 Chat Generative Pre-trained Transformer(ChatGPT),发展速度非常快。OpenAI 于 2023 年 3 月发布了其最新的 ChatGPT 模型 GPT-4,它提供了广泛的医疗应用。该模型在许多医学委员会考试中表现出了显著的能力。本研究旨在评估 GPT-4 在骨科住院医师培训考试(OITE)中的表现,该考试用于为美国骨科委员会(ABOS)第一部分考试做准备。此外,还将 GPT-4 的表现与之前的迭代模型 ChatGPT、GPT-3.5 的数据进行了比较,GPT-3.5 是在 GPT-4 发布前 4 个月发布的。GPT-4 正确回答了 396 个尝试问题中的 251 个(63.4%),而 GPT-3.5 正确回答了 410 个尝试问题中的 46.3%。GPT-4 在骨科委员会风格问题上的准确性明显高于 GPT-3.5(<.00001)。GPT-4 的表现与平均第三年骨科住院医师的表现最为相似,而 GPT-3.5 的表现低于平均骨科实习医生。GPT-4 的整体准确性略低于表明可能通过 ABOS 第一部分考试的大致阈值。我们的结果表明,OpenAI 的最新模型 GPT-4 取得了重大进展。未来的研究应该评估人工智能模型在更大的数据集上继续训练并提供更多功能的潜在临床应用。[2024;47(2):e85-e89。]。

相似文献

[1]
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Orthopedics. 2024

[2]
Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.

Clin Orthop Relat Res. 2023-8-1

[3]
GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.

World Neurosurg. 2023-11

[4]
Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.

Cureus. 2024-6-18

[5]
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024-11

[6]
OpenAI's GPT-4 performs to a high degree on board-style dermatology questions.

Int J Dermatol. 2024-1

[7]
Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.

J Med Internet Res. 2024-7-25

[8]
ChatGPT Earns American Board Certification in Hand Surgery.

Hand Surg Rehabil. 2024-6

[9]
The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.

J Orthop. 2023-11-23

[10]
Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4.

Clin Orthop Surg. 2024-8

引用本文的文献

[1]
Systematic Review on Large Language Models in Orthopaedic Surgery.

J Clin Med. 2025-8-20

[2]
Evaluation of the performance of large language models in endoscopic lumbar surgery: a comparative analysis.

Ann Med Surg (Lond). 2025-6-30

[3]
Performance of AI Models vs. Orthopedic Residents in Turkish Specialty Training Development Exams in Orthopedics.

Sisli Etfal Hastan Tip Bul. 2025-2-7

[4]
Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.

Acta Orthop Traumatol Turc. 2025-7-18

[5]
Editorial - Current capacities and future possibilities of large language models in orthopaedic surgery.

J Exp Orthop. 2025-5-26

[6]
Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025-5-20

[7]
Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

Acta Orthop Traumatol Turc. 2025-3-17

[8]
An Assessment of the Performance of Different Chatbots on Shoulder and Elbow Questions.

J Clin Med. 2025-3-27

[9]
ChatGPT-3.5 and -4.0 Do Not Reliably Create Readable Patient Education Materials for Common Orthopaedic Upper- and Lower-Extremity Conditions.

Arthrosc Sports Med Rehabil. 2024-10-10

[10]
ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial.

JMIR Res Protoc. 2025-2-6

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索