Jin Kuan-Yu, Eckes Thomas
Hong Kong Examinations and Assessment Authority, 68 Gillies Avenue South, Kowloon City, Kowloon, Hong Kong.
TestDaF Institute, University of Bochum, Universitätsstr. 134, 44799, Bochum, Germany.
Behav Res Methods. 2025 Apr 21;57(5):149. doi: 10.3758/s13428-025-02667-6.
Halo effects are commonly considered a cognitive or judgmental bias leading to rating error when raters assign scores to persons or performances on multiple criteria. Though a long tradition of research has pointed to possible sources of halo effects, measurement models for identifying these sources and detecting halo have been lacking. In the present research, we propose a general mixture Rasch facets model for halo effects (MRFM-H) and derive two more specific models, each assuming a different psychological mechanism. According to the first model, MRFM-H(GI), persons evoke general impressions that guide raters when assigning scores on conceptually distinct criteria. The second model, MRFM-H(ID), assumes that raters fail to discriminate adequately between the criteria. We adopted a Bayesian inference approach to implement these models, conducting two simulation studies and a real-data analysis. In the simulation studies, we found that (a) the number of raters and criteria determined the accuracy of classifying persons as inducing or not inducing halo; (b) 90% classification accuracy was achieved when at least 25 ratings were available for each rater-person combination; (c) ignoring halo caused by either mechanism (general impressions or inadequate criterion discrimination) biased the criterion parameter estimates while having a negligible impact on person and rater estimates; (d) Bayesian data-model fit statistics (WAIC and WBIC) reliably identified the true, data-generating model. The real-data analysis highlighted the models' practical utility for examining the likely source of halo effects. The discussion focuses on the models' application in various assessment contexts and points to directions for future research.
晕轮效应通常被认为是一种认知或判断偏差,当评估者根据多个标准对人员或表现进行评分时,会导致评分误差。尽管长期以来的研究传统指出了晕轮效应可能的来源,但缺乏用于识别这些来源和检测晕轮效应的测量模型。在本研究中,我们提出了一种晕轮效应的通用混合Rasch方面模型(MRFM-H),并推导出另外两个更具体的模型,每个模型都假设了不同的心理机制。根据第一个模型,即MRFM-H(GI),个体唤起的总体印象会在评估者根据概念上不同的标准进行评分时指导他们。第二个模型,即MRFM-H(ID),假设评估者未能充分区分这些标准。我们采用贝叶斯推理方法来实现这些模型,进行了两项模拟研究和一项实际数据分析。在模拟研究中,我们发现:(a)评估者和标准的数量决定了将个体分类为是否引发晕轮效应的准确性;(b)当每个评估者与个体的组合至少有25个评分时,分类准确率达到90%;(c)忽略由任何一种机制(总体印象或标准区分不足)导致的晕轮效应会使标准参数估计产生偏差,而对个体和评估者估计的影响可忽略不计;(d)贝叶斯数据-模型拟合统计量(WAIC和WBIC)能够可靠地识别真实的数据生成模型。实际数据分析突出了这些模型在检查晕轮效应可能来源方面的实际效用。讨论聚焦于这些模型在各种评估情境中的应用,并指出了未来研究的方向。