Suppr超能文献

专家对埃塞俄比亚本科医学执照考试选择题项目难度的预测。

Experts' prediction of item difficulty of multiple-choice questions in the Ethiopian Undergraduate Medicine Licensure Examination.

机构信息

Institute of Health, Jimma University, Jimma, Ethiopia.

Faculty of Medicine, Institute of Health and Society, University of Oslo, Oslo, Norway.

出版信息

BMC Med Educ. 2024 Sep 16;24(1):1016. doi: 10.1186/s12909-024-06012-x.

Abstract

BACKGROUND

The ability of an expert's item difficulty ratings to predict test-taker actual performance is an important aspect of licensure examinations. Expert judgment is used as a primary source of information for users to make prior decisions to determine the pass rate of test takers. The nature of raters involved in predicting item difficulty is central to set credible standards. Therefore, this study aimed to assess and compare raters' prediction and actual Multiple-Choice Questions' difficulty of the undergraduate medicine licensure examination (UGMLE) in Ethiopia.

METHOD

815 examinees' responses to 200 Multiple-Choice Questions (MCQs) were used in this study. The study also included experts' item difficulty ratings of seven physicians who participated in the standard settings of UGMLE. Then, analysis was conducted to understand experts' rating variation in predicting the actual difficulty levels of examinees. Descriptive statistics was used to profile the mean rater's and actual difficulty value for MCQs, and ANOVA was used to compare the mean differences between raters' prediction of item difficulty. Additionally, regression analysis was used to understand the interrater variations in item difficulty predictions compared to the actual difficulty. The proportion of variance of actual difficulty explained from rater prediction was computed using regression analysis.

RESULTS

In this study, the mean difference between raters' prediction and examinees' actual performance was inconsistent across the exam domains. The study revealed a statistically significant strong positive correlation between the actual and predicted item difficulty in exam domains eight and eleven. However, a non-statistically significant very weak positive correlation was reported in exam domains seven and twelve. The multiple comparison analysis showed significant differences in mean item difficulty ratings between raters. In the regression analysis, experts' item difficulty ratings of the UGMLE had 33% power in predicting the actual difficulty level. The regression model also showed a moderate positive correlation (R = 0.57) that was statistically significant at F (6, 193) = 15.58, P = 0.001.

CONCLUSION

This study demonstrated the complex process for assessing the difficulty level of MCQs in the UGMLE and emphasized the benefits of using experts' ratings in advance. To ensure the exams maintain the necessary reliable and valid scores, raters' accuracy on the UGMLE must be improved. To achieve this, techniques that align with the evolving assessment methodologies must be developed.

摘要

背景

专家对项目难度的评分能力能够预测应试者的实际表现,这是执照考试的一个重要方面。专家判断是用户做出预先决策以确定应试者通过率的主要信息来源。参与预测项目难度的评分者的性质对于制定可信标准至关重要。因此,本研究旨在评估和比较埃塞俄比亚本科医学执照考试(UGMLE)中评分者对多项选择题(MCQ)难度的预测和实际难度。

方法

本研究使用了 815 名应试者对 200 道多项选择题(MCQ)的回答。研究还包括参与 UGMLE 标准制定的七名医生的专家项目难度评分。然后,进行了分析以了解专家在预测应试者实际难度水平方面的评分变化。使用描述性统计来分析 MCQ 的平均评分者和实际难度值,并使用方差分析比较评分者对项目难度预测的平均差异。此外,使用回归分析了解项目难度预测的评分者之间的差异与实际难度之间的关系。使用回归分析计算了实际难度从评分者预测中解释的方差比例。

结果

在这项研究中,评分者的预测与应试者的实际表现之间的差异在考试领域之间不一致。研究表明,在考试领域八和十一中,实际难度和预测难度之间存在统计学上显著的强正相关。然而,在考试领域七和十二中,报告了非统计学上非常弱的正相关。多比较分析显示评分者之间的平均项目难度评分存在显著差异。在回归分析中,UGMLE 的专家项目难度评分有 33%的能力可以预测实际难度水平。回归模型还显示出中度正相关(R=0.57),在 F(6, 193)=15.58 时具有统计学意义,P=0.001。

结论

本研究展示了评估 UGMLE 多项选择题难度的复杂过程,并强调了提前使用专家评分的好处。为了确保考试保持必要的可靠和有效分数,必须提高评分者在 UGMLE 上的准确性。为了实现这一目标,必须开发与不断发展的评估方法相一致的技术。

相似文献

7
10

本文引用的文献

3
Using clinical simulation to study how to improve quality and safety in healthcare.利用临床模拟研究如何提高医疗保健的质量和安全性。
BMJ Simul Technol Enhanc Learn. 2020 Mar 4;6(2):87-94. doi: 10.1136/bmjstel-2018-000370. Epub 2018 Sep 29.
4
Peer review improves psychometric characteristics of multiple choice questions.同行评审可提高多选题的心理测量学特征。
Med Teach. 2017 Apr;39(sup1):S50-S54. doi: 10.1080/0142159X.2016.1254743. Epub 2017 Jan 20.
7
Twelve Tips for programmatic assessment.程序化评估的十二条建议。
Med Teach. 2015 Jul;37(7):641-646. doi: 10.3109/0142159X.2014.973388. Epub 2014 Nov 20.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验