文献检索，用中文搜 PubMed

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Many medical schools primarily use multiple-choice questions (MCQs) in pre-clinical assessments due to their efficiency and consistency. However, while MCQs are easy to grade, they often fall short in evaluating higher-order reasoning and understanding student thought processes. Despite these limitations, MCQs remain popular because alternative assessments require more time and resources to grade. This study explored whether OpenAI's GPT-4o Large Language Model (LLM) could be used to effectively grade narrative short answer questions (SAQs) in case-based learning (CBL) exams when compared to faculty graders. The primary outcome was equivalence of LLM grading, assessed using a bootstrapping procedure to calculate 95% confidence intervals (CIs) for mean score differences. Equivalence was defined as the entire 95% CI falling within a ± 5% margin. Secondary outcomes included grading precision, subgroup analysis by Bloom's taxonomy, and correlation between question complexity and LLM performance. Analysis of 1,450 responses showed LLM scores were equivalent to faculty scores overall (mean difference: -0.55%, 95% CI: -1.53%, + 0.45%). Equivalence was also demonstrated for Remembering, Applying, and Analyzing questions, however, discrepancies were observed for Understanding and Evaluating questions. AI grading demonstrated high precision (ICC = 0.993, 95% CI: 0.992-0.994). Greater differences between LLM and faculty scores were found for more difficult questions (R2 = 0.6199, p < 0.0001). LLM grading could serve as a tool for preliminary scoring of student assessments, enhancing SAQ grading efficiency and improving undergraduate medical education examination quality. Secondary outcome findings emphasize the need to use these tools in combination with, not as a replacement for, faculty involvement in the grading process.

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

Can AI grade like a professor? comparing artificial intelligence and faculty scoring of medical student short-answer clinical reasoning exams.

相似文献

Can AI grade like a professor? comparing artificial intelligence and faculty scoring of medical student short-answer clinical reasoning exams.

作者信息

机构信息

出版信息

相似文献