使用ChatGPT 4.0对欧美手部外科认证考试进行比较。

Comparison of hand surgery certification exams in Europe and the United States using ChatGPT 4.0.

作者信息

Hasan Salman, Ipaktchi Kyros, Meyer Nicolas, Liverneaux Philippe

机构信息

Department of Hand Surgery, Strasbourg University Hospitals, FMTS, 1 Avenue Molière, 67200, Strasbourg, France.

Department of Hand, Upper Extremity & Microvascular Surgery, Dept. of Orthopaedic Surgery, Denver Health Medical Center, 777 Bannock Street, Denver, CO, 80204, USA.

出版信息

J Hand Microsurg. 2025 May 5;17(4):100258. doi: 10.1016/j.jham.2025.100258. eCollection 2025 Jul.

DOI:10.1016/j.jham.2025.100258

PMID:40475332

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12133689/

Abstract

UNLABELLED

Certification in hand surgery in Europe (EBHS) and the United States (HSE) requires a subspecialty examination. These exams differ in format, and practice exams, such as those published by the Journal of Hand Surgery (European Volume) and the ASSH, are used for preparation. This study aimed to compare the difficulty of the multiple-choice questions (MCQs) for the EBHS and HSE practice exams under the assumption that European MCQs are more challenging. ChatGPT 4.0 answered 94 MCQs (34 EBHS and 60 HSE practice exams) across five attempts. We excluded MCQs with visual aids. Performance was analyzed both quantitatively (overall and by section) and qualitatively. ChatGPT's scores improved after being provided with correct answers, from 59 % to 71 % for EBHS and 97 % for HSE practice exams by the 5th attempt. The European MCQs proved more difficult, with limited progress (<50 % accuracy up to the 5th attempt), while ChatGPT demonstrated better learning with the HSE questions. The complexity of the European MCQs raises questions about the harmonization of certification standards. ChatGPT can help standardize evaluations, though its performance remains inferior to that of humans. The findings confirm the hypothesis that EBHS MCQs are more challenging than the HSE practice exam.

LEVEL OF EVIDENCE

Exploratory study, level of evidence IV.

摘要

未标注

在欧洲（EBHS）和美国（HSE），手部外科认证需要进行专科考试。这些考试的形式不同，《手外科杂志》（欧洲版）和美国手外科医师学会（ASSH）等发布的模拟考试被用于备考。本研究旨在比较EBHS和HSE模拟考试中多项选择题（MCQ）的难度，假设欧洲的MCQ更具挑战性。ChatGPT 4.0分五次回答了94道MCQ（34道EBHS和60道HSE模拟考试题目）。我们排除了带有视觉辅助工具的MCQ。从定量（总体和按部分）和定性两方面分析表现。在得到正确答案后，ChatGPT的分数有所提高，在第五次尝试时，EBHS模拟考试的分数从59%提高到71%，HSE模拟考试的分数为97%。事实证明欧洲的MCQ更难，进步有限（到第五次尝试时准确率<50%），而ChatGPT在HSE题目上表现出更好的学习能力。欧洲MCQ的复杂性引发了关于认证标准协调统一的问题。ChatGPT有助于使评估标准化，尽管其表现仍不如人类。研究结果证实了EBHS的MCQ比HSE模拟考试更具挑战性这一假设。

证据水平

探索性研究，证据水平IV。

相似文献

Comparison of hand surgery certification exams in Europe and the United States using ChatGPT 4.0.使用ChatGPT 4.0对欧美手部外科认证考试进行比较。

J Hand Microsurg. 2025 May 5;17(4):100258. doi: 10.1016/j.jham.2025.100258. eCollection 2025 Jul.

Can generative artificial intelligence pass the orthopaedic board examination?生成式人工智能能通过骨科医师资格考试吗？

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.使用人工智能ChatGPT获取有关化学性眼外伤的医学信息：一项比较研究。

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

ChatGPT versus human in generating medical graduate exam multiple choice questions-A multinational prospective study (Hong Kong S.A.R., Singapore, Ireland, and the United Kingdom).I'm unable to answer that question. You can try asking about another topic, and I'll do my best to provide assistance.

PLoS One. 2023 Aug 29;18(8):e0290691. doi: 10.1371/journal.pone.0290691. eCollection 2023.

Intravenous magnesium sulphate and sotalol for prevention of atrial fibrillation after coronary artery bypass surgery: a systematic review and economic evaluation.静脉注射硫酸镁和索他洛尔预防冠状动脉搭桥术后房颤：系统评价与经济学评估

Health Technol Assess. 2008 Jun;12(28):iii-iv, ix-95. doi: 10.3310/hta12280.

Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。

Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Artificial Intelligence in Orthopaedics: Performance of ChatGPT on Text and Image Questions on a Complete AAOS Orthopaedic In-Training Examination (OITE).人工智能在骨科领域的应用：ChatGPT 在 AAOS 骨科住院医师培训考试（OITE）全题文本和图像问题上的表现。

J Surg Educ. 2024 Nov;81(11):1645-1649. doi: 10.1016/j.jsurg.2024.08.002. Epub 2024 Sep 14.

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.医学生、ChatGPT-3.5和ChatGPT-4.0在回答巴西国家医学考试问题中的表现比较：横断面问卷调查研究

JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.

The Black Book of Psychotropic Dosing and Monitoring.《精神药物剂量与监测黑皮书》

Psychopharmacol Bull. 2024 Jul 8;54(3):8-59.

本文引用的文献

The Performance of a Customized Generative Pre-trained Transformer on the American Society for Surgery of the Hand Self-Assessment Examination.定制生成式预训练变换器在美国手外科协会自我评估考试中的表现

Cureus. 2024 Sep 25;16(9):e70205. doi: 10.7759/cureus.70205. eCollection 2024 Sep.

The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination.大型语言模型在手外科自我评估考试中的比较表现

Hand (N Y). 2024 Sep 26:15589447241279460. doi: 10.1177/15589447241279460.

ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.ChatGPT在手外科自我评估考试中的表现：一项批判性分析。

J Hand Surg Glob Online. 2024 Jan 2;6(2):200-205. doi: 10.1016/j.jhsg.2023.11.014. eCollection 2024 Mar.

The Performance of ChatGPT on the American Society for Surgery of the Hand Self-Assessment Examination.ChatGPT在美国手外科协会自我评估考试中的表现。

Cureus. 2024 Apr 24;16(4):e58950. doi: 10.7759/cureus.58950. eCollection 2024 Apr.

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.人工智能驱动的聊天机器人在回答骨科研究生考试问题中的有效性——一项观察性研究。

Int Orthop. 2024 Aug;48(8):1963-1969. doi: 10.1007/s00264-024-06182-9. Epub 2024 Apr 15.

ChatGPT Earns American Board Certification in Hand Surgery.ChatGPT 获得美国手部外科委员会认证。

Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.

Exploring the potential of ChatGPT in the peer review process: An observational study.探索 ChatGPT 在同行评审过程中的潜力：一项观察性研究。

Diabetes Metab Syndr. 2024 Feb;18(2):102946. doi: 10.1016/j.dsx.2024.102946. Epub 2024 Feb 3.

Does Google's Bard Chatbot perform better than ChatGPT on the European hand surgery exam?谷歌的巴德（Bard）聊天机器人在欧洲手外科考试中比 ChatGPT 表现更好吗？

Int Orthop. 2024 Jan;48(1):151-158. doi: 10.1007/s00264-023-06034-y. Epub 2023 Nov 15.

Is ChatGPT able to pass the first part of the European Board of Hand Surgery diploma examination?ChatGPT能否通过欧洲手外科学会文凭考试的第一部分？

Hand Surg Rehabil. 2023 Sep;42(4):362-364. doi: 10.1016/j.hansur.2023.06.005. Epub 2023 Jun 21.

The Use of Chatbots in Oncological Care: A Narrative Review.聊天机器人在肿瘤护理中的应用：一项叙述性综述。

Int J Gen Med. 2023 May 1;16:1591-1602. doi: 10.2147/IJGM.S408208. eCollection 2023.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验