University of New South Wales Australia, Sydney, Australia.
Cardiff University, Cardiff, UK.
BMC Med Educ. 2018 Jun 7;18(1):126. doi: 10.1186/s12909-018-1238-7.
Standard setting is one of the most contentious topics in educational measurement. Commonly-used methods all have well reported limitations. To date, there is not conclusive evidence suggesting which standard setting method yields the highest validity.
The method described and piloted in this study asked expert judges to estimate the scores on a real MCQ examination that they consider indicated a clear pass, clear fail, and pass mark for the examination as a whole. The mean and SD of the judges responses to these estimates, Z scores and confidence intervals were used to derive the cut-score and the confidence in it.
In this example the new method's cut-score was higher than the judges' estimate. The method also yielded estimates of statistical error which determine the range of the acceptable cut-score and the estimated level of confidence one may have in the accuracy of that cut-score.
This new standard-setting method offers some advances, and possibly advantages, in that the decisions being asked of judges are based on firmer constructs, and it takes into account variation among judges.
标准设定是教育测量中最具争议的话题之一。常用的方法都有很好的报道局限性。迄今为止,没有确凿的证据表明哪种标准设定方法具有最高的有效性。
本研究中描述和试行的方法要求专家评判员估计他们认为明确通过、明确失败和整个考试及格分数的真实 MCQ 考试的分数。评判员对这些估计的平均和标准差、Z 分数和置信区间用于得出切割分数和对其的置信度。
在这个例子中,新方法的切割分数高于评判员的估计。该方法还产生了统计误差的估计,这些估计确定了可接受的切割分数范围以及对切割分数准确性的估计置信度。
这种新的标准设定方法提供了一些进展,可能具有优势,因为向评判员提出的决策基于更坚实的结构,并且考虑了评判员之间的差异。