Leeds Institute of Medical Education, School of Medicine, University of Leeds, Leeds, UK.
School of Medicine, University of Liverpool, Liverpool, UK.
Med Teach. 2020 Sep;42(9):1037-1042. doi: 10.1080/0142159X.2020.1781072. Epub 2020 Jul 1.
There has been a long-running debate about the validity of item-based checklist scoring of performance assessments like OSCEs. In recent years, the conception of a checklist has developed from its dichotomous inception into a more 'key-features' and/or chunked approach, where 'items' have the potential to become weighted differently, but the literature does not always reflect these broader conceptions. We consider theoretical, design and (clinically trained) assessor issues related to differential item weighting in checklist scoring of OSCEs stations. Using empirical evidence, this work also compares candidate decisions and psychometric quality of different item-weighting approaches (i.e. a simple 'unweighted' scheme versus a differentially weighted one). The impact of different weighting schemes affect approximately 30% of the key borderline group of candidates, and 3% of candidates overall. We also find that measures of overall assessment quality are a little better under the differentially weighted scoring system. Differentially weighted modern checklists can contribute to valid assessment outcomes, and bring a range of additional benefits to the assessment. Judgment about weighting of particular items should be considered a key design consideration during station development and must align to clinical assessor expectations of the relative importance of sub-tasks.
关于 OSCE 等表现评估的基于项目的检查表评分的有效性,一直存在着长期的争论。近年来,检查表的概念已经从其二分法的起源发展为更“关键特征”和/或切块的方法,其中“项目”有可能被赋予不同的权重,但文献并不总是反映这些更广泛的概念。我们考虑了与 OSCE 站检查表评分中的差异项目加权相关的理论、设计和(临床培训)评估员问题。使用实证证据,这项工作还比较了不同项目加权方法的候选决策和心理测量质量(即简单的“无权重”方案与差异权重方案)。不同的加权方案大约影响了关键边界候选人群的 30%,以及总体的 3%。我们还发现,在差异加权评分系统下,整体评估质量的衡量标准稍好一些。差异加权的现代检查表可以为有效的评估结果做出贡献,并为评估带来一系列额外的好处。在站开发过程中,应该对特定项目的加权进行判断,并必须与临床评估员对次要任务相对重要性的期望保持一致。