Yeates Peter, Maluf Adriano, McCray Gareth, Kinston Ruth, Cope Natalie, Cullen Kathy, O'Neill Vikki, Cole Aidan, Chung Ching-Wa, Goodfellow Rhian, Vallender Rebecca, Ensaff Sue, Goddard-Fuller Rikki, McKinley Robert
School of Medicine, Keele University, Keele, United Kingdom.
de Montford University, Leicester, United Kingdom.
Med Teach. 2025 Apr;47(4):735-743. doi: 10.1080/0142159X.2024.2372087. Epub 2024 Jul 8.
Ensuring equivalence in high-stakes performance exams is important for patient safety and candidate fairness. We compared inter-school examiner differences within a shared OSCE and resulting impact on students' pass/fail categorisation.
The same 6 station formative OSCE ran asynchronously in 4 medical schools, with 2 parallel circuits/school. We compared examiners' judgements using Video-based Examiner Score Comparison and Adjustment (VESCA): examiners scored station-specific comparator videos in addition to 'live' student performances, enabling 1/controlled score comparisons by a/examiner-cohorts and b/schools and 2/data linkage to adjust for the influence of examiner-cohorts. We calculated score impact and change in pass/fail categorisation by school.
On controlled video-based comparisons, inter-school variations in examiners' scoring (16.3%) were nearly double within-school variations (8.8%). Students' scores received a median adjustment of 5.26% (IQR 2.87-7.17%). The impact of adjusting for examiner differences on students' pass/fail categorisation varied by school, with adjustment reducing failure rate from 39.13% to 8.70% (school 2) whilst increasing failure from 0.00% to 21.74% (school 4).
Whilst the formative context may partly account for differences, these findings query whether variations may exist between medical schools in examiners' judgements. This may benefit from systematic appraisal to safeguard equivalence. VESCA provided a viable method for comparisons.
在高风险的执业资格考试中确保等效性对于患者安全和考生公平至关重要。我们比较了在共享的客观结构化临床考试(OSCE)中学校间考官差异以及对学生通过/未通过分类的影响。
同一6站式形成性OSCE在4所医学院校异步进行,每所学校有2个平行场次。我们使用基于视频的考官评分比较与调整(VESCA)来比较考官的评判:考官除了对“现场”学生表现评分外,还对特定考站的对照视频评分,从而能够1/按考官群体和b/学校进行对照评分比较,以及2/进行数据关联以调整考官群体的影响。我们计算了各学校的分数影响以及通过/未通过分类的变化。
在基于视频的对照比较中,学校间考官评分的差异(16.3%)几乎是学校内差异(8.8%)的两倍。学生分数的中位数调整为5.26%(四分位间距2.87 - 7.17%)。调整考官差异对学生通过/未通过分类的影响因学校而异,调整使未通过率从39.13%降至8.70%(学校2),同时使未通过率从0.00%增至21.74%(学校4)。
虽然形成性考试的背景可能部分解释了差异,但这些发现质疑医学院校之间在考官评判上是否可能存在差异。这可能需要系统评估以保障等效性。VESCA提供了一种可行的比较方法。