人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

出版信息

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

DOI:10.3928/01477447-20230922-05

Abstract

Advances in artificial intelligence and machine learning models, like Chat Generative Pre-trained Transformer (ChatGPT), have occurred at a remarkably fast rate. OpenAI released its newest model of ChatGPT, GPT-4, in March 2023. It offers a wide range of medical applications. The model has demonstrated notable proficiency on many medical board examinations. This study sought to assess GPT-4's performance on the Orthopaedic In-Training Examination (OITE) used to prepare residents for the American Board of Orthopaedic Surgery (ABOS) Part I Examination. The data gathered from GPT-4's performance were additionally compared with the data of the previous iteration of ChatGPT, GPT-3.5, which was released 4 months before GPT-4. GPT-4 correctly answered 251 of the 396 attempted questions (63.4%), whereas GPT-3.5 correctly answered 46.3% of 410 attempted questions. GPT-4 was significantly more accurate than GPT-3.5 on orthopedic board-style questions (<.00001). GPT-4's performance is most comparable to that of an average third-year orthopedic surgery resident, while GPT-3.5 performed below an average orthopedic intern. GPT-4's overall accuracy was just below the approximate threshold that indicates a likely pass on the ABOS Part I Examination. Our results demonstrate significant improvements in OpenAI's newest model, GPT-4. Future studies should assess potential clinical applications as AI models continue to be trained on larger data sets and offer more capabilities. [. 2024;47(2):e85-e89.].

摘要

人工智能和机器学习模型的进步，如 Chat Generative Pre-trained Transformer（ChatGPT），发展速度非常快。OpenAI 于 2023 年 3 月发布了其最新的 ChatGPT 模型 GPT-4，它提供了广泛的医疗应用。该模型在许多医学委员会考试中表现出了显著的能力。本研究旨在评估 GPT-4 在骨科住院医师培训考试（OITE）中的表现，该考试用于为美国骨科委员会（ABOS）第一部分考试做准备。此外，还将 GPT-4 的表现与之前的迭代模型 ChatGPT、GPT-3.5 的数据进行了比较，GPT-3.5 是在 GPT-4 发布前 4 个月发布的。GPT-4 正确回答了 396 个尝试问题中的 251 个（63.4%），而 GPT-3.5 正确回答了 410 个尝试问题中的 46.3%。GPT-4 在骨科委员会风格问题上的准确性明显高于 GPT-3.5（<.00001）。GPT-4 的表现与平均第三年骨科住院医师的表现最为相似，而 GPT-3.5 的表现低于平均骨科实习医生。GPT-4 的整体准确性略低于表明可能通过 ABOS 第一部分考试的大致阈值。我们的结果表明，OpenAI 的最新模型 GPT-4 取得了重大进展。未来的研究应该评估人工智能模型在更大的数据集上继续训练并提供更多功能的潜在临床应用。[2024；47（2）：e85-e89。]。

相似文献

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

Orthopedics. 2024 Mar-Apr;47(2):e85-e89. doi: 10.3928/01477447-20230922-05. Epub 2023 Sep 27.

Can Artificial Intelligence Pass the American Board of Orthopaedic Surgery Examination? Orthopaedic Residents Versus ChatGPT.人工智能能通过美国骨科医师学会考试吗？骨科住院医师与ChatGPT的对比。

Clin Orthop Relat Res. 2023 Aug 1;481(8):1623-1630. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23.

GPT-4 Artificial Intelligence Model Outperforms ChatGPT, Medical Students, and Neurosurgery Residents on Neurosurgery Written Board-Like Questions.GPT-4人工智能模型在类似神经外科书面考试的问题上表现优于ChatGPT、医学生和神经外科住院医师。

World Neurosurg. 2023 Nov;179:e160-e165. doi: 10.1016/j.wneu.2023.08.042. Epub 2023 Aug 18.

Inadequate Performance of ChatGPT on Orthopedic Board-Style Written Exams.ChatGPT在骨科委员会风格笔试中的表现不佳。

Cureus. 2024 Jun 18;16(6):e62643. doi: 10.7759/cureus.62643. eCollection 2024 Jun.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

OpenAI's GPT-4 performs to a high degree on board-style dermatology questions.OpenAI 的 GPT-4 在-board 风格的皮肤病学问题上表现出色。

Int J Dermatol. 2024 Jan;63(1):73-78. doi: 10.1111/ijd.16913. Epub 2023 Dec 22.

Performance of ChatGPT Across Different Versions in Medical Licensing Examinations Worldwide: Systematic Review and Meta-Analysis.ChatGPT 在全球医学执照考试不同版本中的表现：系统评价和荟萃分析。

J Med Internet Res. 2024 Jul 25;26:e60807. doi: 10.2196/60807.

ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。

Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.

The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.ChatGPT在骨科在职培训考试中的表现：GPT-3.5 turbo和GPT-4模型在骨科教育中的比较研究。

J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.

Performance of ChatGPT on Solving Orthopedic Board-Style Questions: A Comparative Analysis of ChatGPT 3.5 and ChatGPT 4.ChatGPT 在解决骨科 Board 风格问题方面的表现：ChatGPT 3.5 和 ChatGPT 4 的对比分析

Clin Orthop Surg. 2024 Aug;16(4):669-673. doi: 10.4055/cios23179. Epub 2024 Mar 7.

引用本文的文献

Systematic Review on Large Language Models in Orthopaedic Surgery.骨科手术中大型语言模型的系统评价

J Clin Med. 2025 Aug 20;14(16):5876. doi: 10.3390/jcm14165876.

Evaluation of the performance of large language models in endoscopic lumbar surgery: a comparative analysis.大型语言模型在内镜腰椎手术中的性能评估：一项比较分析。

Ann Med Surg (Lond). 2025 Jun 30;87(8):4835-4840. doi: 10.1097/MS9.0000000000003519. eCollection 2025 Aug.

Performance of AI Models vs. Orthopedic Residents in Turkish Specialty Training Development Exams in Orthopedics.人工智能模型与土耳其骨科专科培训发展考试中骨科住院医师的表现对比。

Sisli Etfal Hastan Tip Bul. 2025 Feb 7;59(2):151-155. doi: 10.14744/SEMB.2025.65289. eCollection 2025.

Assessing the role of large language models in adolescent idiopathic scoliosis care: a comparison between ChatGPT and Google Gemini.评估大语言模型在青少年特发性脊柱侧弯护理中的作用：ChatGPT与谷歌Gemini的比较

Acta Orthop Traumatol Turc. 2025 Jul 18;59(4):222-229. doi: 10.5152/j.aott.2025.25279.

Editorial - Current capacities and future possibilities of large language models in orthopaedic surgery.社论——骨科手术中大型语言模型的当前能力与未来可能性

J Exp Orthop. 2025 May 26;12(2):e70273. doi: 10.1002/jeo2.70273. eCollection 2025 Apr.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.ChatGPT与互联网搜索用于职业医学临床研究和决策的比较：随机对照试验

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Exploring the role of artificial intelligence in Turkish orthopedic progression exams.探索人工智能在土耳其骨科进展考试中的作用。

Acta Orthop Traumatol Turc. 2025 Mar 17;59(1):18-26. doi: 10.5152/j.aott.2025.24090.

An Assessment of the Performance of Different Chatbots on Shoulder and Elbow Questions.不同聊天机器人在肩部和肘部问题上的性能评估。

J Clin Med. 2025 Mar 27;14(7):2289. doi: 10.3390/jcm14072289.

ChatGPT-3.5 and -4.0 Do Not Reliably Create Readable Patient Education Materials for Common Orthopaedic Upper- and Lower-Extremity Conditions.ChatGPT-3.5和-4.0不能可靠地为常见的骨科上肢和下肢疾病创建可读性强的患者教育材料。

Arthrosc Sports Med Rehabil. 2024 Oct 10;7(1):101027. doi: 10.1016/j.asmr.2024.101027. eCollection 2025 Feb.

ChatGPT-4 Performance on German Continuing Medical Education-Friend or Foe (Trick or Treat)? Protocol for a Randomized Controlled Trial.ChatGPT-4在德国继续医学教育中的表现——朋友还是敌人（不给糖就捣蛋）？一项随机对照试验方案

JMIR Res Protoc. 2025 Feb 6;14:e63887. doi: 10.2196/63887.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

人工智能的快速发展：GPT-4 在骨科手术委员会问题上的表现。

The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献