Suppr超能文献

ChatGPT 未能通过台湾家庭医学专科医师甄试。

ChatGPT failed Taiwan's Family Medicine Board Exam.

机构信息

Center for Geriatrics and Gerontology, Taipei Veterans General Hospital, Taipei, Taiwan, ROC.

Institute of Public Health, School of Medicine, National Yang Ming Chiao Tung University, Taipei, Taiwan, ROC.

出版信息

J Chin Med Assoc. 2023 Aug 1;86(8):762-766. doi: 10.1097/JCMA.0000000000000946. Epub 2023 Jun 9.

Abstract

BACKGROUND

Chat Generative Pre-trained Transformer (ChatGPT), OpenAI Limited Partnership, San Francisco, CA, USA is an artificial intelligence language model gaining popularity because of its large database and ability to interpret and respond to various queries. Although it has been tested by researchers in different fields, its performance varies depending on the domain. We aimed to further test its ability in the medical field.

METHODS

We used questions from Taiwan's 2022 Family Medicine Board Exam, which combined both Chinese and English and covered various question types, including reverse questions and multiple-choice questions, and mainly focused on general medical knowledge. We pasted each question into ChatGPT and recorded its response, comparing it to the correct answer provided by the exam board. We used SAS 9.4 (Cary, North Carolina, USA) and Excel to calculate the accuracy rates for each question type.

RESULTS

ChatGPT answered 52 questions out of 125 correctly, with an accuracy rate of 41.6%. The questions' length did not affect the accuracy rates. These were 45.5%, 33.3%, 58.3%, 50.0%, and 43.5% for negative-phrase questions, multiple-choice questions, mutually exclusive options, case scenario questions, and Taiwan's local policy-related questions, with no statistical difference observed.

CONCLUSION

ChatGPT's accuracy rate was not good enough for Taiwan's Family Medicine Board Exam. Possible reasons include the difficulty level of the specialist exam and the relatively weak database of traditional Chinese language resources. However, ChatGPT performed acceptably in negative-phrase questions, mutually exclusive questions, and case scenario questions, and it can be a helpful tool for learning and exam preparation. Future research can explore ways to improve ChatGPT's accuracy rate for specialized exams and other domains.

摘要

背景

美国加利福尼亚州旧金山的 OpenAI Limited Partnership 开发的人工智能语言模型 Chat Generative Pre-trained Transformer(ChatGPT)因其大型数据库以及解释和响应各种查询的能力而广受欢迎。尽管它已经在不同领域的研究人员中进行了测试,但它的性能因领域而异。我们旨在进一步测试其在医学领域的能力。

方法

我们使用了来自台湾 2022 年家庭医学专科医师考试的问题,这些问题结合了中文和英文,涵盖了各种问题类型,包括逆向问题和多项选择题,主要集中在一般医学知识上。我们将每个问题粘贴到 ChatGPT 中,并记录其响应,将其与考试委员会提供的正确答案进行比较。我们使用 SAS 9.4(美国北卡罗来纳州卡里)和 Excel 计算每种问题类型的准确率。

结果

ChatGPT 正确回答了 125 个问题中的 52 个,准确率为 41.6%。问题的长度并不影响准确率。负向问题、多项选择题、互斥选项、案例情景问题和台湾本地政策相关问题的准确率分别为 45.5%、33.3%、58.3%、50.0%和 43.5%,差异无统计学意义。

结论

ChatGPT 在台湾家庭医学专科医师考试中的准确率不够理想。可能的原因包括专科考试的难度水平和传统中文资源数据库相对较弱。然而,ChatGPT 在负向问题、互斥问题和案例情景问题中的表现可接受,它可以成为学习和考试准备的有用工具。未来的研究可以探索如何提高 ChatGPT 在专科考试和其他领域的准确率。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验