Suppr超能文献

评估ChatGPT-3.5和ChatGPT-4在台湾整形外科医师资格考试中的表现。

Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

作者信息

Hsieh Ching-Hua, Hsieh Hsiao-Yun, Lin Hui-Ping

机构信息

Department of Plastic Surgery, Kaohsiung Chang Gung Memorial Hospital, Chang Gung University and College of Medicine, Kaohsiung, 83301, Taiwan.

出版信息

Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.

Abstract

BACKGROUND

Chat Generative Pre-Trained Transformer (ChatGPT) is a state-of-the-art large language model that has been evaluated across various medical fields, with mixed performance on licensing examinations. This study aimed to assess the performance of ChatGPT-3.5 and ChatGPT-4 in answering questions from the Taiwan Plastic Surgery Board Examination.

METHODS

The study evaluated the performance of ChatGPT-3.5 and ChatGPT-4 on 1375 questions from the past 8 years of the Taiwan Plastic Surgery Board Examination, including 985 single-choice and 390 multiple-choice questions. We obtained the responses between June and July 2023, launching a new chat session for each question to eliminate memory retention bias.

RESULTS

Overall, ChatGPT-4 outperformed ChatGPT-3.5, achieving a 59 % correct answer rate compared to 41 % for ChatGPT-3.5. ChatGPT-4 passed five out of eight yearly exams, whereas ChatGPT-3.5 failed all. On single-choice questions, ChatGPT-4 scored 66 % correct, compared to 48 % for ChatGPT-3.5. On multiple-choice, ChatGPT-4 achieved a 43 % correct rate, nearly double the 23 % of ChatGPT-3.5.

CONCLUSION

As ChatGPT evolves, its performance on the Taiwan Plastic Surgery Board Examination is expected to improve further. The study suggests potential reforms, such as incorporating more problem-based scenarios, leveraging ChatGPT to refine exam questions, and integrating AI-assisted learning into candidate preparation. These advancements could enhance the assessment of candidates' critical thinking and problem-solving abilities in the field of plastic surgery.

摘要

背景

聊天生成预训练变换器(ChatGPT)是一种先进的大型语言模型,已在各个医学领域进行了评估,在执照考试中的表现参差不齐。本研究旨在评估ChatGPT-3.5和ChatGPT-4在回答台湾整形外科委员会考试问题方面的表现。

方法

该研究评估了ChatGPT-3.5和ChatGPT-4对台湾整形外科委员会考试过去8年的1375道问题的表现,包括985道单项选择题和390道多项选择题。我们在2023年6月至7月期间获得了回答,为每个问题开启一个新的聊天会话,以消除记忆保留偏差。

结果

总体而言,ChatGPT-4的表现优于ChatGPT-3.5,正确答案率达到59%,而ChatGPT-3.5为41%。ChatGPT-4在八项年度考试中的五项中通过,而ChatGPT-3.5全部未通过。在单项选择题上,ChatGPT-4的正确率为66%,而ChatGPT-3.5为48%。在多项选择题上,ChatGPT-4的正确率为43%,几乎是ChatGPT-3.5的23%的两倍。

结论

随着ChatGPT的不断发展,其在台湾整形外科委员会考试中的表现有望进一步提高。该研究提出了潜在的改革建议,例如纳入更多基于问题的情景、利用ChatGPT完善考试题目,以及将人工智能辅助学习纳入考生备考。这些进步可以加强对整形外科领域考生批判性思维和解决问题能力的评估。

相似文献

4
ChatGPT failed Taiwan's Family Medicine Board Exam.ChatGPT 未能通过台湾家庭医学专科医师甄试。
J Chin Med Assoc. 2023 Aug 1;86(8):762-766. doi: 10.1097/JCMA.0000000000000946. Epub 2023 Jun 9.
10
Performance of ChatGPT and GPT-4 on Neurosurgery Written Board Examinations.ChatGPT和GPT-4在神经外科笔试中的表现。
Neurosurgery. 2023 Dec 1;93(6):1353-1365. doi: 10.1227/neu.0000000000002632. Epub 2023 Aug 15.

本文引用的文献

4
Xiaoqing: A Q&A model for glaucoma based on LLMs.晓青:基于大语言模型的青光眼问答模型。
Comput Biol Med. 2024 May;174:108399. doi: 10.1016/j.compbiomed.2024.108399. Epub 2024 Apr 12.
7
The Potential Applications and Challenges of ChatGPT in the Medical Field.ChatGPT在医学领域的潜在应用与挑战
Int J Gen Med. 2024 Mar 5;17:817-826. doi: 10.2147/IJGM.S456659. eCollection 2024.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验