Fowell S L, Fewtrell R, McLaughlin P J
School of Medical Education, University of Liverpool, 2nd Floor Cedar House, Liverpool, UK.
Adv Health Sci Educ Theory Pract. 2008 Mar;13(1):11-24. doi: 10.1007/s10459-006-9027-1. Epub 2006 Sep 7.
Absolute standard setting procedures are recommended for assessment in medical education. Absolute, test-centred standard setting procedures were introduced for written assessments in the Liverpool MBChB in 2001. The modified Angoff and Ebel methods have been used for short answer question-based and extended matching question-based papers, respectively. Data collected has been analysed to investigate whether reliable standards can be achieved for small-scale, medical school-based assessments, to establish the minimum number of judges required and the effect of a discussion phase on reliability. The root mean squared error (RMSE) has been used as a measure of reliability and used to compute 95% confidence intervals for comparison to the examination statistics. The RMSE has been used to calculate the minimum number of judges required to obtain a predetermined minimum level of reliability, and the effect of the number of judges and number of items have been examined. Values of the RMSE obtained vary from 0.9 to 2.2%. Using average variances across each paper type, the minimum number of judges to obtain a RMSE of less than 2% is 10 or more judges before discussion or 6 or more judges after discussion. The results indicate that including a discussion phase improves the reliability and reduces the minimum number of judges required. Decision studies indicate that increasing the number of questions included in the assessments would not significantly improve the reliability of the standard setting.
在医学教育评估中,建议采用绝对标准设定程序。2001年,利物浦医学学士课程的笔试引入了以测试为中心的绝对标准设定程序。改良的安格夫法和埃贝尔法分别用于基于简答题和扩展匹配题的试卷。对收集到的数据进行了分析,以调查在小规模的医学院校评估中是否能够实现可靠的标准,确定所需的最少评判人数以及讨论阶段对可靠性的影响。均方根误差(RMSE)已被用作可靠性的衡量指标,并用于计算95%置信区间,以便与考试统计数据进行比较。RMSE已被用于计算获得预定最低可靠性水平所需的最少评判人数,并研究了评判人数和题目数量的影响。获得的RMSE值在0.9%至2.2%之间。使用每种试卷类型的平均方差,要获得小于2%的RMSE,讨论前最少需要10名或更多评判人员,讨论后最少需要6名或更多评判人员。结果表明,加入讨论阶段可提高可靠性并减少所需的最少评判人数。决策研究表明,增加评估中包含的问题数量不会显著提高标准设定的可靠性。