Hasan Salman, Ipaktchi Kyros, Meyer Nicolas, Liverneaux Philippe
Department of Hand Surgery, Strasbourg University Hospitals, FMTS, 1 Avenue Molière, 67200, Strasbourg, France.
Department of Hand, Upper Extremity & Microvascular Surgery, Dept. of Orthopaedic Surgery, Denver Health Medical Center, 777 Bannock Street, Denver, CO, 80204, USA.
J Hand Microsurg. 2025 May 5;17(4):100258. doi: 10.1016/j.jham.2025.100258. eCollection 2025 Jul.
Certification in hand surgery in Europe (EBHS) and the United States (HSE) requires a subspecialty examination. These exams differ in format, and practice exams, such as those published by the Journal of Hand Surgery (European Volume) and the ASSH, are used for preparation. This study aimed to compare the difficulty of the multiple-choice questions (MCQs) for the EBHS and HSE practice exams under the assumption that European MCQs are more challenging. ChatGPT 4.0 answered 94 MCQs (34 EBHS and 60 HSE practice exams) across five attempts. We excluded MCQs with visual aids. Performance was analyzed both quantitatively (overall and by section) and qualitatively. ChatGPT's scores improved after being provided with correct answers, from 59 % to 71 % for EBHS and 97 % for HSE practice exams by the 5th attempt. The European MCQs proved more difficult, with limited progress (<50 % accuracy up to the 5th attempt), while ChatGPT demonstrated better learning with the HSE questions. The complexity of the European MCQs raises questions about the harmonization of certification standards. ChatGPT can help standardize evaluations, though its performance remains inferior to that of humans. The findings confirm the hypothesis that EBHS MCQs are more challenging than the HSE practice exam.
Exploratory study, level of evidence IV.
在欧洲(EBHS)和美国(HSE),手部外科认证需要进行专科考试。这些考试的形式不同,《手外科杂志》(欧洲版)和美国手外科医师学会(ASSH)等发布的模拟考试被用于备考。本研究旨在比较EBHS和HSE模拟考试中多项选择题(MCQ)的难度,假设欧洲的MCQ更具挑战性。ChatGPT 4.0分五次回答了94道MCQ(34道EBHS和60道HSE模拟考试题目)。我们排除了带有视觉辅助工具的MCQ。从定量(总体和按部分)和定性两方面分析表现。在得到正确答案后,ChatGPT的分数有所提高,在第五次尝试时,EBHS模拟考试的分数从59%提高到71%,HSE模拟考试的分数为97%。事实证明欧洲的MCQ更难,进步有限(到第五次尝试时准确率<50%),而ChatGPT在HSE题目上表现出更好的学习能力。欧洲MCQ的复杂性引发了关于认证标准协调统一的问题。ChatGPT有助于使评估标准化,尽管其表现仍不如人类。研究结果证实了EBHS的MCQ比HSE模拟考试更具挑战性这一假设。
探索性研究,证据水平IV。