Imperial College School of Medicine, Imperial College London, London, UK.
Warwick Medical School, University of Warwick, Warwick, UK.
BMC Med Educ. 2022 Oct 5;22(1):708. doi: 10.1186/s12909-022-03753-5.
Standard setting for clinical examinations typically uses the borderline regression method to set the pass mark. An assumption made in using this method is that there are equal intervals between global ratings (GR) (e.g. Fail, Borderline Pass, Clear Pass, Good and Excellent). However, this assumption has never been tested in the medical literature to the best of our knowledge. We examine if the assumption of equal intervals between GR is met, and the potential implications for student outcomes.
Clinical finals examiners were recruited across two institutions to place the typical 'Borderline Pass', 'Clear Pass' and 'Good' candidate on a continuous slider scale between a typical 'Fail' candidate at point 0 and a typical 'Excellent' candidate at point 1. Results were analysed using one-sample t-testing of each interval to an equal interval size of 0.25. Secondary data analysis was performed on summative assessment scores for 94 clinical stations and 1191 medical student examination outcomes in the final 2 years of study at a single centre.
On a scale from 0.00 (Fail) to 1.00 (Excellent), mean examiner GRs for 'Borderline Pass', 'Clear Pass' and 'Good' were 0.33, 0.55 and 0.77 respectively. All of the four intervals between GRs (Fail-Borderline Pass, Borderline Pass-Clear Pass, Clear Pass-Good, Good-Excellent) were statistically significantly different to the expected value of 0.25 (all p-values < 0.0125). An ordinal linear regression using mean examiner GRs was performed for each of the 94 stations, to determine pass marks out of 24. This increased pass marks for all 94 stations compared with the original GR locations (mean increase 0.21), and caused one additional fail by overall exam pass mark (out of 1191 students) and 92 additional station fails (out of 11,346 stations).
Although the current assumption of equal intervals between GRs across the performance spectrum is not met, and an adjusted regression equation causes an increase in station pass marks, the effect on overall exam pass/fail outcomes is modest.
临床考试的标准设置通常使用边界回归方法来设定及格分数。使用这种方法的一个假设是,总体评分(GR)之间存在相等的间隔(例如,不及格、边缘及格、清晰及格、良好和优秀)。然而,据我们所知,这一假设在医学文献中从未得到过验证。我们检查了 GR 之间等间隔的假设是否成立,以及这对学生成绩的潜在影响。
在两个机构中招募临床期末考试考官,将典型的“边缘及格”、“清晰及格”和“良好”候选人放在一个连续的滑动标尺上,位于典型的“不及格”候选人(得分为 0)和典型的“优秀”候选人(得分为 1)之间。使用每个间隔到相等间隔大小为 0.25 的单样本 t 检验对结果进行分析。在一个中心的最后两年的学习中,对 94 个临床站和 1191 名医学生考试成绩的总结性评估分数进行了二次数据分析。
在从 0.00(不及格)到 1.00(优秀)的量表上,考官对“边缘及格”、“清晰及格”和“良好”的平均 GR 分别为 0.33、0.55 和 0.77。GR 之间的所有四个间隔(不及格-边缘及格、边缘及格-清晰及格、清晰及格-良好、良好-优秀)与预期值 0.25 均有统计学显著差异(所有 p 值均<0.0125)。使用平均考官 GR 对每个 94 个站点进行了有序线性回归,以确定 24 分中的及格分数。与原始 GR 位置相比,所有 94 个站点的及格分数都有所提高(平均提高 0.21),导致整体考试及格分数多了一个不及格(1191 名学生中)和 92 个站点不及格(11346 个站点中)。
尽管目前在表现谱上 GR 之间等间隔的假设不成立,并且调整后的回归方程会导致站点及格分数增加,但对整体考试及格/不及格结果的影响是适度的。