Wind Stefanie A, Ge Yuan
The University of Alabama, Tuscaloosa, AL, USA.
Educ Psychol Meas. 2021 Oct;81(5):996-1022. doi: 10.1177/0013164420988108. Epub 2021 Jan 19.
Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for detecting rater biases, or differential rater functioning (DRF). The purpose of this study is to illustrate and explore the sensitivity of DRF indices in realistic sparse rating designs that have been documented in the literature that include different types and levels of connectivity among raters and students. The results indicated that it is possible to detect DRF in sparse rating designs, but the sensitivity of DRF indices varies across designs. We consider the implications of our findings for practice related to monitoring raters in performance assessments.
评分者介导评估中的实际限制因素限制了完整数据的可用性。相反,大多数评分程序对每个表现给出一两个评分,评分者之间存在重叠表现,或者将多项选择题组相联系以促进模型估计。这些不完整的评分设计给检测评分者偏差或评分者差异功能(DRF)带来了挑战。本研究的目的是说明和探讨DRF指标在现实的稀疏评分设计中的敏感性,这些设计已在文献中有所记载,包括评分者与学生之间不同类型和水平的关联性。结果表明,在稀疏评分设计中有可能检测到DRF,但DRF指标的敏感性因设计而异。我们考虑了研究结果对与绩效评估中监测评分者相关实践的影响。