ChatGPT's Performance on the Hand Surgery Self-Assessment Exam: A Critical Analysis.

PURPOSE: To assess the performance of Chat Generative Pre-Trained Transformer (ChatGPT) when answering self-assessment exam questions in hand surgery and to compare correct results for text-only questions to those for questions that included images. METHODS: This study used 10 self-assessment exams from 2004 to 2013 provided by the American Society for Surgery of the Hand (ASSH). ChatGPT's performance on text-only questions and image-based questions was compared. The primary outcomes were ChatGPT's total score, score on text-only questions, and score on image-based questions. The secondary outcomes were the proportion of questions for which ChatGPT provided additional explanations, the length of those elaborations, and the number of questions for which ChatGPT provided answers with certainty. RESULTS: Out of 1,583 questions, ChatGPT answered 573 (36.2%) correct. ChatGPT performed better on text-only questions than image-based questions. Out of 1,127 text-only questions, ChatGPT answered 442 (39.2%) correctly. Out of the 456 image-based questions, it answered 131 (28.7%) correctly. There was no difference between the proportion of elaborations among text-only and image-based questions. Although there was no difference between the length of elaborations for questions ChatGPT got correct and incorrect, the length of elaborations provided for image-based questions were longer than those provided for text-only questions. Out of 1,441 confident answers, 548 (38.0%) were correct; out of 142 unconfident answers, 25 (17.6%) were correct. CONCLUSIONS: ChatGPT performed poorly on the ASSH self-assessment exams from 2004 to 2013. It performed better on text-only questions. Even with its highest score of 42% for the year 2012, the AI platform would not have received continuing medical education credit from ASSH or the American Board of Surgery. Even when only considering questions without images, ChatGPT's high score of 44% correct would not have "passed" the examination. CLINICAL RELEVANCE: At this time, medical professionals, trainees, and patients should use ChatGPT with caution as the program has not yet developed proficiency with hand subspecialty knowledge.

新学期，新优惠

Suppr 超能文献

新学期，新优惠

Suppr 超能文献

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

推荐工具