Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway.
Psychometrika. 2023 Sep;88(3):1002-1025. doi: 10.1007/s11336-023-09919-4. Epub 2023 Jun 8.
Several measures of agreement, such as the Perreault-Leigh coefficient, the [Formula: see text], and the recent coefficient of van Oest, are based on explicit models of how judges make their ratings. To handle such measures of agreement under a common umbrella, we propose a class of models called guessing models, which contains most models of how judges make their ratings. Every guessing model have an associated measure of agreement we call the knowledge coefficient. Under certain assumptions on the guessing models, the knowledge coefficient will be equal to the multi-rater Cohen's kappa, Fleiss' kappa, the Brennan-Prediger coefficient, or other less-established measures of agreement. We provide several sample estimators of the knowledge coefficient, valid under varying assumptions, and their asymptotic distributions. After a sensitivity analysis and a simulation study of confidence intervals, we find that the Brennan-Prediger coefficient typically outperforms the others, with much better coverage under unfavorable circumstances.
几种一致性度量,如 Perreault-Leigh 系数、[公式:见文本]和最近的 van Oest 系数,都是基于法官如何进行评分的明确模型。为了在一个通用框架下处理这些一致性度量,我们提出了一类称为猜测模型的模型,该模型包含了大多数法官如何进行评分的模型。每个猜测模型都有一个关联的一致性度量,我们称之为知识系数。在猜测模型的某些假设下,知识系数将等于多评分者 Cohen's kappa、Fleiss' kappa、Brennan-Prediger 系数或其他不太成熟的一致性度量。我们提供了几个知识系数的样本估计量,它们在不同的假设下是有效的,并且它们的渐近分布。经过敏感性分析和置信区间的模拟研究,我们发现 Brennan-Prediger 系数通常表现优于其他系数,在不利情况下具有更好的覆盖范围。