Suppr超能文献

ChatGPT能否生成适用于医学院解剖学考试的基于案例的多项选择题?关于题目难度和区分度的初步研究。

Can ChatGPT Generate Acceptable Case-Based Multiple-Choice Questions for Medical School Anatomy Exams? A Pilot Study on Item Difficulty and Discrimination.

作者信息

Kıyak Yavuz Selim, Soylu Ayşe, Coşkun Özlem, Budakoğlu Işıl İrem, Peker Tuncay Veysel

机构信息

Department of Medical Education and Informatics, Faculty of Medicine, Gazi University, Ankara, Turkey.

Department of Anatomy, Faculty of Medicine, Gazi University, Ankara, Turkey.

出版信息

Clin Anat. 2025 May;38(4):505-510. doi: 10.1002/ca.24271. Epub 2025 Mar 24.

Abstract

Developing high-quality multiple-choice questions (MCQs) for medical school exams is effortful and time-consuming. In this study, we investigated the ability of ChatGPT to generate case-based anatomy MCQs with acceptable levels of item difficulty and discrimination for medical school exams. We used ChatGPT to generate case-based anatomy MCQs for an endocrine and urogenital system exam based on a framework for artificial intelligence (AI)-assisted item generation. The questions were evaluated by experts, approved by the department, and administered to 502 second-year medical students (372 Turkish-language, 130 English-language). The items were analyzed to determine the discrimination and difficulty indices. The item discrimination indices ranged from 0.29 to 0.54, indicating acceptable differentiation between high- and low-performing students. All items in Turkish (six out of six) and five out of six in English met the higher discrimination threshold (≥ 0.30) required for large-scale standardized tests. The item difficulty indices ranged from 0.41 to 0.89, most items falling within the moderate difficulty range (0.20-0.80). Therefore, it was concluded that ChatGPT can generate case-based anatomy MCQs with acceptable psychometric properties, offering a promising tool for medical educators. However, human expertise remains crucial for reviewing and refining AI-generated assessment items. Future research should explore AI-generated MCQs across various anatomy topics and investigate different AI models for question generation.

摘要

为医学院考试编写高质量的多项选择题既费力又耗时。在本研究中,我们调查了ChatGPT生成基于案例的解剖学多项选择题的能力,这些题目对于医学院考试而言,在题目难度和区分度方面达到了可接受的水平。我们基于人工智能辅助题目生成框架,使用ChatGPT为一场内分泌和泌尿生殖系统考试生成基于案例的解剖学多项选择题。这些题目由专家进行评估,经系里批准后,施测于502名二年级医学生(372名使用土耳其语,130名使用英语)。对这些题目进行分析以确定区分度和难度指数。题目区分度指数在0.29至0.54之间,表明成绩高和低的学生之间有可接受的区分度。所有土耳其语题目(6题中的6题)和6题中的5题英语题目达到了大规模标准化考试所需的更高区分度阈值(≥0.30)。题目难度指数在0.41至0.89之间,大多数题目落在中等难度范围内(0.20 - 0.80)。因此,得出的结论是,ChatGPT能够生成具有可接受心理测量特性的基于案例的解剖学多项选择题,为医学教育工作者提供了一个有前景的工具。然而,人工专业知识对于审查和完善人工智能生成的评估题目仍然至关重要。未来的研究应探索跨各种解剖学主题的人工智能生成的多项选择题,并研究用于题目生成的不同人工智能模型。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验