Huang Hung-Yu
Department of Psychology and Counseling, University of Taipei, Taipei, Taiwan.
Appl Psychol Meas. 2023 Jun;47(4):312-327. doi: 10.1177/01466216231174566. Epub 2023 May 13.
Rater effects are commonly observed in rater-mediated assessments. By using item response theory (IRT) modeling, raters can be treated as independent factors that function as instruments for measuring ratees. Most rater effects are static and can be addressed appropriately within an IRT framework, and a few models have been developed for dynamic rater effects. Operational rating projects often require human raters to continuously and repeatedly score ratees over a certain period, imposing a burden on the cognitive processing abilities and attention spans of raters that stems from judgment fatigue and thus affects the rating quality observed during the rating period. As a result, ratees' scores may be influenced by the order in which they are graded by raters in a rating sequence, and the rating order effect should be considered in new IRT models. In this study, two types of many-faceted (MF)-IRT models are developed to account for such dynamic rater effects, which assume that rater severity can drift systematically or stochastically. The results obtained from two simulation studies indicate that the parameters of the newly developed models can be estimated satisfactorily using Bayesian estimation and that disregarding the rating order effect produces biased model structure and ratee proficiency parameter estimations. A creativity assessment is outlined to demonstrate the application of the new models and to investigate the consequences of failing to detect the possible rating order effect in a real rater-mediated evaluation.
评分者效应在评分者介导的评估中普遍存在。通过使用项目反应理论(IRT)建模,评分者可被视为独立因素,充当衡量被评者的工具。大多数评分者效应是静态的,可在IRT框架内得到妥善处理,并且已经开发了一些针对动态评分者效应的模型。实际的评分项目通常要求人工评分者在一定时期内持续且反复地对被评者进行评分,这给评分者的认知处理能力和注意力跨度带来了负担,这种负担源于判断疲劳,进而影响评分期间观察到的评分质量。因此,被评者的分数可能会受到评分者在评分序列中对其评分顺序的影响,在新的IRT模型中应考虑评分顺序效应。在本研究中,开发了两种多面(MF)-IRT模型来解释这种动态评分者效应,该效应假设评分者的严格程度会系统地或随机地漂移。两项模拟研究的结果表明,使用贝叶斯估计可以令人满意地估计新开发模型的参数,而忽略评分顺序效应会产生有偏差的模型结构和被评者能力参数估计。概述了一项创造力评估,以展示新模型的应用,并研究在实际的评分者介导评估中未能检测到可能的评分顺序效应的后果。