Wang Ying-Mei, Shen Hung-Wei, Chen Tzeng-Ji, Chiang Shu-Chiung, Lin Ting-Guan
Department of Medical Education and Research, Taipei Veterans General Hospital Hsinchu Branch, 81, Section 1, Zhongfeng Road, Zhudong, Hsinchu, 310, Taiwan, 886 03-5962134 ext 127.
Department of Pharmacy, Taipei Veterans General Hospital Hsinchu Branch, Hsinchu, Taiwan.
JMIR Med Educ. 2025 Jan 17;11:e56850. doi: 10.2196/56850.
OpenAI released versions ChatGPT-3.5 and GPT-4 between 2022 and 2023. GPT-3.5 has demonstrated proficiency in various examinations, particularly the United States Medical Licensing Examination. However, GPT-4 has more advanced capabilities.
This study aims to examine the efficacy of GPT-3.5 and GPT-4 within the Taiwan National Pharmacist Licensing Examination and to ascertain their utility and potential application in clinical pharmacy and education.
The pharmacist examination in Taiwan consists of 2 stages: basic subjects and clinical subjects. In this study, exam questions were manually fed into the GPT-3.5 and GPT-4 models, and their responses were recorded; graphic-based questions were excluded. This study encompassed three steps: (1) determining the answering accuracy of GPT-3.5 and GPT-4, (2) categorizing question types and observing differences in model performance across these categories, and (3) comparing model performance on calculation and situational questions. Microsoft Excel and R software were used for statistical analyses.
GPT-4 achieved an accuracy rate of 72.9%, overshadowing GPT-3.5, which achieved 59.1% (P<.001). In the basic subjects category, GPT-4 significantly outperformed GPT-3.5 (73.4% vs 53.2%; P<.001). However, in clinical subjects, only minor differences in accuracy were observed. Specifically, GPT-4 outperformed GPT-3.5 in the calculation and situational questions.
This study demonstrates that GPT-4 outperforms GPT-3.5 in the Taiwan National Pharmacist Licensing Examination, particularly in basic subjects. While GPT-4 shows potential for use in clinical practice and pharmacy education, its limitations warrant caution. Future research should focus on refining prompts, improving model stability, integrating medical databases, and designing questions that better assess student competence and minimize guessing.
OpenAI在2022年至2023年期间发布了ChatGPT-3.5和GPT-4版本。GPT-3.5在各种考试中已展现出一定水平,尤其是在美国医师执照考试中。然而,GPT-4具备更先进的能力。
本研究旨在检验GPT-3.5和GPT-4在台湾国家药剂师执照考试中的效果,并确定它们在临床药学和教育中的实用性及潜在应用。
台湾的药剂师考试分为两个阶段:基础科目和临床科目。在本研究中,考试题目被手动输入GPT-3.5和GPT-4模型,并记录它们的回答;基于图形的题目被排除。本研究包括三个步骤:(1)确定GPT-3.5和GPT-4的答题准确率,(2)对题目类型进行分类并观察模型在这些类别中的表现差异,(3)比较模型在计算题和情景题上的表现。使用Microsoft Excel和R软件进行统计分析。
GPT-4的准确率达到72.9%,超过了GPT-3.5的59.1%(P<0.001)。在基础科目类别中,GPT-4明显优于GPT-3.5(73.4%对53.2%;P<0.001)。然而,在临床科目中,仅观察到准确率上的微小差异。具体而言,GPT-4在计算题和情景题上优于GPT-3.5。
本研究表明,在台湾国家药剂师执照考试中,GPT-4优于GPT-3.5,尤其是在基础科目方面。虽然GPT-4在临床实践和药学教育中显示出应用潜力,但其局限性仍需谨慎对待。未来的研究应专注于优化提示、提高模型稳定性、整合医学数据库以及设计能更好评估学生能力并减少猜测因素的题目。