Burr Steven, Martin Theresa, Edwards James, Ferguson Colin, Gilbert Kerry, Gray Christian, Hill Adele, Hosking Joanne, Johnstone Karen, Kisielewska Jolanta, Milsom Chloe, Moyes Siobhan, Rigby-Jones Ann, Robinson Iain, Toms Nick, Watson Helen, Zahra Daniel
Peninsula Medical School.
School of Clinical Medicine.
MedEdPublish (2016). 2021 Feb 3;10:32. doi: 10.15694/mep.2021.000032.1. eCollection 2021.
This article was migrated. The article was marked as recommended. We challenge the philosophical acceptability of the Angoff method, and propose an alternative method of standard setting based on how important it is for candidates to know the material each test item assesses, and how difficult it is for a subgroup of candidates to answer each item. The practicalities of an alternative method of standard setting are evaluated here, for the first time, with direct comparison to an Angoff method. To negate bias due to any leading effects, a prospective cross-over design was adopted involving two groups of judges (n=7 and n=8), both of which set the standards for the same two 100 item multiple choice question tests, by the two different methods. Overall, we found that the two methods took a similar amount of time to complete. The alternative method produced a higher cut-score (by 12-14%), and had a higher degree of variability between judges' cut-scores (by 5%). When using the alternative method, judges reported a small, but statistically significant, increase in their confidence to decide accurately the standard (by 3%). This is a new approach to standard setting where the quantitative differences are slight, but there are clear qualitative advantages associated with use of the alternative method.
本文已迁移。该文章被标记为推荐文章。我们对安格夫方法的哲学可接受性提出质疑,并基于考生了解每个测试项目所评估内容的重要性以及某一考生亚组回答每个项目的难度,提出一种替代的标准设定方法。本文首次对一种替代标准设定方法的实用性进行评估,并与安格夫方法进行直接比较。为消除任何引导效应导致的偏差,采用了前瞻性交叉设计,涉及两组评判员(一组7人,另一组8人),两组均通过两种不同方法为相同的两份包含100道多项选择题的测试设定标准。总体而言,我们发现两种方法完成所需时间相近。替代方法得出的及格分数更高(高12% - 14%),评判员及格分数之间的变异性也更高(高5%)。使用替代方法时,评判员报告称他们准确判定标准的信心有小幅但在统计学上显著的提升(提升3%)。这是一种标准设定的新方法,虽然定量差异微小,但使用替代方法具有明显的定性优势。