Suppr超能文献

GPT-4在药学教育中生成有效考试项目的有效性探索性评估。

Exploratory Assessment of GPT-4's Effectiveness in Generating Valid Exam Items in Pharmacy Education.

作者信息

Shultz Benjamin, DiDomenico Robert J, Goliak Kristen, Mucksavage Jeffrey

机构信息

University of Illinois Chicago, Retzky College of Pharmacy, Chicago, IL, USA.

University of Illinois Chicago, Retzky College of Pharmacy, Chicago, IL, USA.

出版信息

Am J Pharm Educ. 2025 May;89(5):101405. doi: 10.1016/j.ajpe.2025.101405. Epub 2025 Apr 15.

Abstract

OBJECTIVE

To evaluate the effectiveness of GPT-4 in generating valid multiple-choice exam items for assessing therapeutic knowledge in pharmacy education.

METHODS

A custom GPT application was developed to create 60 case-based items from a pharmacotherapy textbook. Nine subject matter experts reviewed items for content validity, difficulty, and quality. Valid items were compiled into a 38-question exam administered to 46 fourth-year pharmacy students. Classical test theory and Rasch analysis were used to assess psychometric properties.

RESULTS

Of 60 generated items, 38 met content validity requirements, with only 6 accepted without revisions. The exam demonstrated moderate reliability and correlated well with a prior cumulative therapeutics exam. Classical item analysis revealed that most items had acceptable point biserial correlations, though fewer than half fell within the recommended difficulty range. Rasch analysis indicated potential multidimensionality and suboptimal targeting of item difficulty to student ability.

CONCLUSION

GPT-4 offers a preliminary step toward generating exam content in pharmacy education but has clear limitations that require further investigation and validation. Substantial human oversight and psychometric evaluation are necessary to ensure clinical realism and appropriate difficulty. Future research with larger samples is needed to further validate the effectiveness of artificial intelligence in item generation for high-stakes assessments in pharmacy education.

摘要

目的

评估GPT-4在生成用于评估药学教育中治疗学知识的有效多项选择题方面的有效性。

方法

开发了一个定制的GPT应用程序,从一本药物治疗学教科书中创建60个基于案例的题目。九位学科专家对题目进行了内容效度、难度和质量方面的审查。有效的题目被汇编成一份包含38个问题的考试,施测于46名四年级药学专业学生。使用经典测试理论和Rasch分析来评估心理测量特性。

结果

在生成的60个题目中,38个符合内容效度要求,只有6个未经修改即被接受。该考试显示出中等信度,并且与之前的累积治疗学考试相关性良好。经典题目分析表明,大多数题目具有可接受的点二列相关,尽管只有不到一半的题目落在推荐的难度范围内。Rasch分析表明存在潜在的多维性以及题目难度与学生能力的匹配度欠佳。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验