Karahan B N, Emekli E
Department of Radiology, Eskişehir Osmangazi University, Faculty of Medicine, Eskişehir, Türkiye.
Department of Radiology, Eskişehir Osmangazi University, Faculty of Medicine, Eskişehir, Türkiye; Translational Medicine Application and Research Center, Eskişehir Osmangazi University, Eskişehir, Türkiye.
Radiography (Lond). 2025 Jul 16;31(5):103087. doi: 10.1016/j.radi.2025.103087.
Creating high-quality multiple-choice questions (MCQs) is vital in health education, particularly in fields like medical imaging. AI-based chatbots have emerged as a tool to automate this process. This study evaluates the applicability, difficulty, and discrimination indices of MCQs generated by various AI chatbots for medical imaging education.
80 MCQs were generated by seven AI-based chatbots (Claude 3, Claude 3.5, ChatGPT-3.5, ChatGPT-4.0, Copilot, Gemini, Turin Q, and Writesonic) using lecture materials. These questions were evaluated for relevance, accuracy, and originality by radiology faculty, and then administered to 56 students and 12 research assistants. The questions were analyzed using Miller's Pyramid to assess cognitive levels, with difficulty and discrimination indices calculated.
AI-based chatbots generated MCQs suitable for medical imaging education, with 72.5 % of the questions deemed appropriate. Most questions assessed recall (79.31 %), suggesting that AI models excel at generating basic knowledge questions but struggle with higher cognitive skills. Differences in question quality were noted between chatbots, with Claude 3 being the most reliable. The difficulty index averaged 0.62, indicating a moderate level of difficulty, but some models produced easier questions.
AI chatbots show promise for automating MCQ creation in health education, though most questions focus on recall. For AI to fully support health education, further development is needed to improve question quality, especially in higher cognitive domains.
AI-based chatbots can support educators in generating MCQs, especially for assessing basic knowledge in medical imaging. While useful for saving time, expert review remains essential to ensure question quality and to address higher-level cognitive skills. Integrating AI tools into assessment workflows may enhance efficiency, provided there is appropriate oversight.
创建高质量的多项选择题在健康教育中至关重要,尤其是在医学成像等领域。基于人工智能的聊天机器人已成为自动化这一过程的工具。本研究评估了各种人工智能聊天机器人生成的多项选择题在医学成像教育中的适用性、难度和区分度指标。
七个基于人工智能的聊天机器人(Claude 3、Claude 3.5、ChatGPT - 3.5、ChatGPT - 4.0、Copilot、Gemini、Turin Q和Writesonic)利用讲座材料生成了80道多项选择题。放射科教员对这些问题的相关性、准确性和原创性进行了评估,然后将其施测于56名学生和12名研究助理。使用米勒金字塔分析这些问题以评估认知水平,并计算难度和区分度指标。
基于人工智能的聊天机器人生成了适用于医学成像教育的多项选择题,72.5%的问题被认为是合适的。大多数问题考查回忆(79.31%),这表明人工智能模型在生成基础知识问题方面表现出色,但在更高认知技能方面存在困难。注意到不同聊天机器人生成的问题质量存在差异,Claude 3是最可靠的。难度指数平均为0.62,表明难度适中,但一些模型生成的问题较简单。
人工智能聊天机器人在健康教育中自动创建多项选择题方面显示出前景,不过大多数问题集中在回忆方面。为使人工智能充分支持健康教育,需要进一步发展以提高问题质量,特别是在更高认知领域。
基于人工智能的聊天机器人可以帮助教育工作者生成多项选择题,特别是用于评估医学成像中的基础知识。虽然有助于节省时间,但专家审查对于确保问题质量和解决更高层次的认知技能仍然至关重要。将人工智能工具整合到评估工作流程中可能会提高效率,前提是有适当的监督。