Hsieh Ching-Hua, Hsieh Hsiao-Yun, Lin Hui-Ping
Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan.
Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.
Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.
The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.
Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.
As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.
聊天生成预训练变换器(ChatGPT)是一种先进的大型语言模型,已在各个医学领域进行了评估,在执照考试中的表现参差不齐。本研究旨在评估ChatGPT-3.5和ChatGPT-4在回答台湾整形外科委员会考试问题方面的表现。
该研究评估了ChatGPT-3.5和ChatGPT-4对台湾整形外科委员会考试过去8年的1375道问题的表现,包括985道单项选择题和390道多项选择题。我们在2023年6月至7月期间获得了回答,为每个问题开启一个新的聊天会话,以消除记忆保留偏差。
总体而言,ChatGPT-4的表现优于ChatGPT-3.5,正确答案率达到59%,而ChatGPT-3.5为41%。ChatGPT-4在八项年度考试中的五项中通过,而ChatGPT-3.5全部未通过。在单项选择题上,ChatGPT-4的正确率为66%,而ChatGPT-3.5为48%。在多项选择题上,ChatGPT-4的正确率为43%,几乎是ChatGPT-3.5的23%的两倍。
随着ChatGPT的不断发展,其在台湾整形外科委员会考试中的表现有望进一步提高。该研究提出了潜在的改革建议,例如纳入更多基于问题的情景、利用ChatGPT完善考试题目,以及将人工智能辅助学习纳入考生备考。这些进步可以加强对整形外科领域考生批判性思维和解决问题能力的评估。