ChatGPT-3.5和ChatGPT-4在台湾国家药剂师执照考试中的表现：比较评估研究。

Performance of ChatGPT-3.5 and ChatGPT-4 in the Taiwan National Pharmacist Licensing Examination: Comparative Evaluation Study.

作者信息

Wang Ying-Mei, Shen Hung-Wei, Chen Tzeng-Ji, Chiang Shu-Chiung, Lin Ting-Guan

机构信息

Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong, Hsinchu, 310, Taiwan, 886 03-5962134 ext 127.

Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan.

出版信息

JMIR Med Educ. 2025 Jan 17;11:e56850. doi: 10.2196/56850.

DOI:10.2196/56850

PMID:39864950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11769692/

Abstract

BACKGROUND

OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities.

OBJECTIVE

This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education.

METHODS

The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses.

RESULTS

GPT-4 achieved an accuracy rate of 72.9%, overshadowing GPT-3.5, which achieved 59.1% (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4% vs 53.2%; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions.

CONCLUSIONS

This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing.

摘要

背景

OpenAI在2022年至2023年期间发布了ChatGPT-3.5和GPT-4版本。GPT-3.5在各种考试中已展现出一定水平，尤其是在美国医师执照考试中。然而，GPT-4具备更先进的能力。

目的

本研究旨在检验GPT-3.5和GPT-4在台湾国家药剂师执照考试中的效果，并确定它们在临床药学和教育中的实用性及潜在应用。

方法

台湾的药剂师考试分为两个阶段：基础科目和临床科目。在本研究中，考试题目被手动输入GPT-3.5和GPT-4模型，并记录它们的回答；基于图形的题目被排除。本研究包括三个步骤：（1）确定GPT-3.5和GPT-4的答题准确率，（2）对题目类型进行分类并观察模型在这些类别中的表现差异，（3）比较模型在计算题和情景题上的表现。使用Microsoft Excel和R软件进行统计分析。