Santen Sally A, Ryan Michael, Helou Marieka A, Richards Alicia, Perera Robert A, Haley Kellen, Bradner Melissa, Rigby Fidelma B, Park Yoon Soo
Virginia Commonwealth University School of Medicine, Richmond, VA, USA.
College of Medicine, University of Illinois at Chicago, Chicago, IL, USA.
Med Teach. 2021 Dec;43(12):1374-1380. doi: 10.1080/0142159X.2021.1948519. Epub 2021 Sep 17.
Systematic differences among raters' approaches to student assessment may result in leniency or stringency of assessment scores. This study examines the generalizability of medical student workplace-based competency assessments including the impact of rater-adjusted scores for leniency and stringency.
Data were collected from summative clerkship assessments completed for 204 students during 2017-2018 the clerkship at a single institution. Generalizability theory was used to explore variance attributed to different facets (rater, learner, item, and competency domain) through three unbalanced random-effects models by clerkship including applying assessor stringency-leniency adjustments.
In the original assessments, only 4-8% of the variance was attributed to the student with the remainder being rater variance and error. Aggregating items to create a composite score increased variability attributable to the student (5-13% of variance). Applying a stringency-leniency ('hawk-dove') correction substantially increased the variance attributed to the student (14.8-17.8%) and reliability. Controlling for assessor leniency/stringency reduced measurement error, decreasing the number of assessments required for generalizability from 16-50 to 11-14.
Similar to prior research, most of the variance in competency assessment scores was attributable to raters, with only a small proportion attributed to the student. Making stringency-leniency corrections using rater-adjusted scores improved the psychometric characteristics of assessment scores.
评分者对学生评估方法的系统性差异可能导致评估分数的宽松或严格。本研究考察了医学生基于工作场所的能力评估的可推广性,包括评分者调整分数对宽松度和严格度的影响。
数据收集自2017 - 2018年在单一机构完成的204名学生的期末临床实习评估。通过三个不平衡随机效应模型,利用概化理论探索归因于不同方面(评分者、学习者、项目和能力领域)的方差,这些模型按临床实习分类,包括应用评估者严格度 - 宽松度调整。
在原始评估中,只有4 - 8%的方差归因于学生,其余为评分者方差和误差。将项目汇总以创建综合分数增加了可归因于学生的变异性(方差的5 - 13%)。应用严格度 - 宽松度(“鹰派 - 鸽派”)校正显著增加了可归因于学生的方差(14.8 - 17.8%)和信度。控制评估者的宽松度/严格度减少了测量误差,将可推广性所需的评估次数从16 - 50次减少到11 - 14次。
与先前研究类似,能力评估分数的大部分方差归因于评分者,只有一小部分归因于学生。使用评分者调整分数进行严格度 - 宽松度校正改善了评估分数的心理测量特征。