Yeates Peter, McCray Gareth
School of Medicine, Keele University, David Weatherall Building, Keele, ST5 5BG, UK.
BMC Med Educ. 2024 Dec 18;24(1):1466. doi: 10.1186/s12909-024-06462-3.
Ensuring examiner equivalence across distributed assessment locations is a priority within distributed Objective Structured Clinical Exams (OSCEs) but is challenging as examiners are typically fully nested within locations (i.e. no overlap in performances seen by different groups of examiners). Video-based Examiner Score Comparison and Adjustment (VESCA) is a recently developed method which uses video-based linking to compare and (potentially) adjust for the effect of different groups of examiners within OSCEs. Whilst initial research on VESCA has been promising, the accuracy of the resulting adjusted scores is unknown. Given this, we aimed to investigate the accuracy of adjusted scores produced by VESCA under a range of plausible operational parameters.
Using statistical simulation, we investigated how: 1/proportion of participating examiners, 2/ number of linking videos, 3/baseline differences in examiner stringency between schools (i.e. whether examiners in School A are, on average, more stringent than the examiners in School B), 4/number of OSCE stations and 5/different degrees of random error within examiners' judgements influenced accuracy of adjusted scores. We generated distributions of students' "true" performances across several stations, added examiner error, and simulated linking through crossed video-scoring (as occurs in VESCA). We then used Many Facet Rasch Modelling to produce an adjusted score for each student which we compared with their corresponding original "true" performance score. We replicated this 1000 times for each permutation to determine average error reduction and the proportion of students whose scores became more accurate. Simulation parameters were derived from a real, summative, whole curriculum undergraduate Year 3 OSCE at Keele University School of Medicine.
We found that in all conditions where no baseline difference existed between groups of examiners, score adjustment only minimally improved or even worsened score accuracy. Conversely, as the size of baseline differences between schools increased, adjustment accuracy increased, reducing error by up to 71% and making scores more accurate for up to 93% of students in the 20% baseline-difference condition.
Score adjustment through VESCA has the potential to substantially enhance equivalence for candidates in distributed OSCEs in some circumstances, whilst making scores less accurate in others. These findings will support judgements about when score adjustment may beneficially aid OSCE equivalence.
在分布式客观结构化临床考试(OSCE)中,确保不同考试地点的考官评分等效是一个优先事项,但这具有挑战性,因为考官通常完全局限于各自所在的地点(即不同考官组看到的考生表现没有重叠)。基于视频的考官分数比较与调整(VESCA)是一种最近开发的方法,它利用基于视频的关联来比较并(可能)调整OSCE中不同考官组的影响。虽然对VESCA的初步研究很有前景,但调整后分数的准确性尚不清楚。鉴于此,我们旨在研究在一系列合理的操作参数下,VESCA产生的调整后分数的准确性。
我们使用统计模拟来研究以下因素如何影响调整后分数的准确性:1/参与考官的比例、2/关联视频的数量、3/学校间考官严格程度的基线差异(即A校的考官平均而言是否比B校的考官更严格)、4/OSCE考站的数量以及5/考官判断中的不同程度随机误差。我们生成了学生在多个考站的“真实”表现分布,加入考官误差,并通过交叉视频评分模拟关联(如同VESCA中的情况)。然后,我们使用多面Rasch模型为每个学生生成一个调整后分数,并将其与相应的原始“真实”表现分数进行比较。我们对每个排列重复此过程1000次,以确定平均误差减少情况以及分数变得更准确的学生比例。模拟参数源自基尔大学医学院本科三年级真实的、总结性的全课程OSCE。
我们发现,在考官组之间不存在基线差异的所有情况下,分数调整仅略微提高或甚至降低了分数准确性。相反,随着学校间基线差异的增大,调整准确性提高,在20%基线差异的情况下,误差最多可减少71%,并且高达93%的学生分数变得更准确。
通过VESCA进行分数调整在某些情况下有可能显著提高分布式OSCE中考生的等效性,而在其他情况下会使分数准确性降低。这些发现将有助于判断分数调整何时可能对OSCE等效性有有益帮助。