P. Yeates is a senior lecturer in medical education research, School of Medicine, Keele University, Keele, Staffordshire, and a consultant in acute and respiratory medicine, Fairfield General Hospital, Pennine Acute Hospitals, NHS Trust, Bury, Lancashire, United Kingdom; ORCID: https://orcid.org/0000-0001-6316-4051 .
A. Moult is a research assistant in medical education, School of Medicine, Keele University, Keele, Staffordshire, United Kingdom; ORCID: https://orcid.org/0000-0002-9424-5660 .
Acad Med. 2021 Aug 1;96(8):1189-1196. doi: 10.1097/ACM.0000000000004028. Epub 2021 Mar 2.
Ensuring that examiners in different parallel circuits of objective structured clinical examinations (OSCEs) judge to the same standard is critical to the chain of validity. Recent work suggests examiner-cohort (i.e., the particular group of examiners) could significantly alter outcomes for some candidates. Despite this, examiner-cohort effects are rarely examined since fully nested data (i.e., no crossover between the students judged by different examiner groups) limit comparisons. In this study, the authors aim to replicate and further develop a novel method called Video-based Examiner Score Comparison and Adjustment (VESCA), so it can be used to enhance quality assurance of distributed or national OSCEs.
In 2019, 6 volunteer students were filmed on 12 stations in a summative OSCE. In addition to examining live student performances, examiners from 8 separate examiner-cohorts scored the pool of video performances. Examiners scored videos specific to their station. Video scores linked otherwise fully nested data, enabling comparisons by Many Facet Rasch Modeling. Authors compared and adjusted for examiner-cohort effects. They also compared examiners' scores when videos were embedded (interspersed between live students during the OSCE) or judged later via the Internet.
Having accounted for differences in students' ability, different examiner-cohort scores for the same ability of student ranged from 18.57 of 27 (68.8%) to 20.49 (75.9%), Cohen's d = 1.3. Score adjustment changed the pass/fail classification for up to 16% of students depending on the modeled cut score. Internet and embedded video scoring showed no difference in mean scores or variability. Examiners' accuracy did not deteriorate over the 3-week Internet scoring period.
Examiner-cohorts produced a replicable, significant influence on OSCE scores that was unaccounted for by typical assessment psychometrics. VESCA offers a promising means to enhance validity and fairness in distributed OSCEs or national exams. Internet-based scoring may enhance VESCA's feasibility.
确保不同客观结构化临床考试(OSCE)平行测试中的考官按照相同标准进行评判,这对确保有效性至关重要。最近的研究表明,考官群体(即特定的考官群体)可能会对某些考生的成绩产生重大影响。尽管如此,由于完全嵌套数据(即不同考官组评判的学生之间没有交叉)限制了比较,因此很少检查考官群体效应。在这项研究中,作者旨在复制并进一步开发一种名为基于视频的考官评分比较和调整(VESCA)的新方法,以便将其用于增强分布式或全国性 OSCE 的质量保证。
2019 年,6 名志愿者学生在总结性 OSCE 的 12 个站点上进行了拍摄。除了检查学生的现场表演外,来自 8 个独立考官群体的考官还对录像表演进行了评分。考官对特定于其岗位的录像进行评分。视频评分链接了其他完全嵌套的数据,使通过多方面的拉什模型进行比较成为可能。作者比较并调整了考官群体的影响。他们还比较了将视频嵌入(在 OSCE 期间穿插在现场学生之间)或通过互联网稍后进行评估时考官的评分。
在考虑到学生能力差异的情况下,同一学生相同能力的不同考官群体评分范围从 27 分中的 18.57 分(68.8%)到 20.49 分(75.9%),Cohen's d = 1.3。根据模型化的切割分数,评分调整改变了多达 16%的学生的及格/不及格分类。基于互联网的评分和嵌入式视频评分在平均分数或变异性方面没有差异。随着互联网评分期的过去 3 周,考官的准确性并未恶化。
考官群体对 OSCE 成绩产生了可复制的、显著的影响,这是典型的评估心理测量学无法解释的。VESCA 提供了一种有前途的方法,可以提高分布式 OSCE 或全国考试的有效性和公平性。基于互联网的评分可能会增强 VESCA 的可行性。