文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

ChatGPT在骨科在职培训考试中的表现:GPT-3.5 turbo和GPT-4模型在骨科教育中的比较研究。

The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.

作者信息

Rizzo Michael G, Cai Nathan, Constantinescu David

机构信息

University of Miami Hospital, Department of Orthopaedic Surgery, 1611 NW 12th Ave #303, Miami, FL, 33136, USA.

The University of Miami Leonard M. Miller School of Medicine, Department of Education, 1600 NW 10th Ave #1140, Miami, FL, 33136, USA.

出版信息

J Orthop. 2023 Nov 23;50:70-75. doi: 10.1016/j.jor.2023.11.056. eCollection 2024 Apr.


DOI:10.1016/j.jor.2023.11.056
PMID:38173829
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10758621/
Abstract

INTRODUCTION: The rapid advancement of artificial intelligence (AI), particularly the development of Large Language Models (LLMs) such as Generative Pretrained Transformers (GPTs), has revolutionized numerous fields. The purpose of this study is to investigate the application of LLMs within the realm of orthopaedic in training examinations. METHODS: Questions from the 2020-2022 Orthopaedic In-Service Training Exams (OITEs) were given to OpenAI's GPT-3.5 Turbo and GPT-4 LLMs, using a zero-shot inference approach. Each model was given a multiple-choice question, without prior exposure to similar queries, and their generated responses were compared to the correct answer within each OITE. The models were evaluated on overall accuracy, performance on questions with and without media, and performance on first- and higher-order questions. RESULTS: The GPT-4 model outperformed the GPT-3.5 Turbo model across all years and question categories (2022: 67.63% vs. 50.24%; 2021: 58.69% vs. 47.42%; 2020: 59.53% vs. 46.51%). Both models showcased better performance with questions devoid of associated media, with GPT-4 attaining accuracies of 68.80%, 65.14%, and 68.22% for 2022, 2021, and 2020, respectively. GPT-4 outscored GPT-3.5 Turbo on first-order questions across all years (2022: 63.83% vs. 38.30%; 2021: 57.45% vs. 50.00%; 2020: 65.74% vs. 53.70%). GPT-4 also outscored GPT-3.5 Turbo on higher-order questions across all years (2022: 68.75% vs. 53.75%; 2021: 59.66% vs. 45.38%; 2020: 53.27% vs. 39.25%). DISCUSSION: GPT-4 showed improved performance compared to GPT-3.5 Turbo in all tested categories. The results reflect the potential and limitations of AI in orthopaedics. GPT-4's performance is comparable to a second-to-third-year resident and GPT-3.5 Turbo's performance is comparable to a first-year resident, suggesting the application of current LLMs can neither pass the OITE nor substitute orthopaedic training. This study sets a precedent for future endeavors integrating GPT models into orthopaedic education and underlines the necessity for specialized training of these models for specific medical domains.

摘要

引言:人工智能(AI)的迅速发展,尤其是生成式预训练变换器(GPT)等大语言模型(LLM)的发展,已经彻底改变了众多领域。本研究的目的是调查大语言模型在骨科培训考试领域的应用。 方法:采用零样本推理方法,将2020 - 2022年骨科在职培训考试(OITE)的问题提供给OpenAI的GPT - 3.5 Turbo和GPT - 4大语言模型。每个模型被给予一道多项选择题,事先未接触过类似问题,然后将它们生成的答案与每个OITE中的正确答案进行比较。对模型在总体准确性、有无媒体问题的表现以及一阶和高阶问题的表现进行评估。 结果:在所有年份和问题类别中,GPT - 4模型的表现均优于GPT - 3.5 Turbo模型(2022年:67.63%对50.24%;2021年:58.69%对47.42%;2020年:59.53%对46.51%)。两个模型在没有相关媒体的问题上表现更好,2022年、2021年和2020年GPT - 4的准确率分别为68.80%、65.14%和68.22%。在所有年份的一阶问题上,GPT - 4的得分均高于GPT - 3.5 Turbo(2022年:63.83%对38.30%;2021年:57.45%对50.00%;2020年:65.74%对53.70%)。在所有年份的高阶问题上,GPT - 4的得分也高于GPT - 3.5 Turbo(2022年:68.75%对53.75%;2021年:59.66%对45.38%;2020年:53.27%对39.25%)。 讨论:与GPT - 3.5 Turbo相比,GPT - 4在所有测试类别中表现出更好的性能。结果反映了人工智能在骨科领域的潜力和局限性。GPT - 4的表现与二至三年级住院医师相当,GPT - 3.5 Turbo的表现与一年级住院医师相当,这表明当前的大语言模型既不能通过OITE考试,也不能替代骨科培训。本研究为未来将GPT模型整合到骨科教育中的努力树立了先例,并强调了针对特定医学领域对这些模型进行专门训练的必要性。

相似文献

[1]
The performance of ChatGPT on orthopaedic in-service training exams: A comparative study of the GPT-3.5 turbo and GPT-4 models in orthopaedic education.

J Orthop. 2023-11-23

[2]
The Rapid Development of Artificial Intelligence: GPT-4's Performance on Orthopedic Surgery Board Questions.

Orthopedics. 2024

[3]
Harnessing advanced large language models in otolaryngology board examinations: an investigation using python and application programming interfaces.

Eur Arch Otorhinolaryngol. 2025-4-25

[4]
Large language models (LLMs) in radiology exams for medical students: Performance and consequences.

Rofo. 2024-11-4

[5]
Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).

J Surg Educ. 2024-11

[6]
Comparitive performance of artificial intelligence-based large language models on the orthopedic in-training examination.

J Orthop Surg (Hong Kong). 2025

[7]
Artificial Intelligence in Medical Education: Comparative Analysis of ChatGPT, Bing, and Medical Students in Germany.

JMIR Med Educ. 2023-9-4

[8]
Assessing GPT-4's Performance in Delivering Medical Advice: Comparative Analysis With Human Experts.

JMIR Med Educ. 2024-7-8

[9]
Comparison of ChatGPT-3.5, ChatGPT-4, and Orthopaedic Resident Performance on Orthopaedic Assessment Examinations.

J Am Acad Orthop Surg. 2023-12-1

[10]
Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard.

JMIR Med Educ. 2024-2-21

引用本文的文献

[1]
Systematic Review on Large Language Models in Orthopaedic Surgery.

J Clin Med. 2025-8-20

[2]
Evaluating the accuracy of CHATGPT models in answering multiple-choice questions on oral and maxillofacial pathologies and oral radiology.

Digit Health. 2025-7-8

[3]
Modern artificial intelligence and large language models in graduate medical education: a scoping review of attitudes, applications & practice.

BMC Med Educ. 2025-5-20

[4]
Evaluating retrieval augmented generation and ChatGPT's accuracy on orthopaedic examination assessment questions.

Ann Jt. 2025-4-22

[5]
Answering Patterns in SBA Items: Students, GPT3.5, and Gemini.

Med Sci Educ. 2024-11-26

[6]
Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

Acta Orthop Traumatol Turc. 2025-3-17

[7]
Exploring the Current Applications of Artificial Intelligence in Orthopaedic Surgical Training: A Systematic Scoping Review.

Cureus. 2025-4-3

[8]
Semantic Clinical Artificial Intelligence vs Native Large Language Model Performance on the USMLE.

JAMA Netw Open. 2025-4-1

[9]
ChatGPT 4.0's efficacy in the self-diagnosis of non-traumatic hand conditions.

J Hand Microsurg. 2025-1-23

[10]
Evaluating ChatGPT-4o as a decision support tool in multidisciplinary sarcoma tumor boards: heterogeneous performance across various specialties.

Front Oncol. 2025-1-17

本文引用的文献

[1]
GPT-4 passes the bar exam.

Philos Trans A Math Phys Eng Sci. 2024-4-15

[2]
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.

Neurosurgery. 2023-12-1

[3]
ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story?

Eur Heart J Digit Health. 2023-4-24

[4]
Comparison of GPT-3.5, GPT-4, and human user performance on a practice ophthalmology written examination.

Eye (Lond). 2023-12

[5]
ChatGPT Is Equivalent to First-Year Plastic Surgery Residents: Evaluation of ChatGPT on the Plastic Surgery In-Service Examination.

Aesthet Surg J. 2023-11-16

[6]
The rise of ChatGPT: Exploring its potential in medical education.

Anat Sci Educ. 2024

[7]
Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023-2-9

[8]
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023-2-8

[9]
Initial impressions of ChatGPT for anatomy education.

Anat Sci Educ. 2024-3

[10]
The Orthopaedic In-Training Examination (OITE).

Clin Orthop Relat Res. 1971

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索