BI Norwegian Business School.
Psychol Methods. 2019 Aug;24(4):439-451. doi: 10.1037/met0000183. Epub 2018 May 3.
We derive a general structure that encompasses important coefficients of interrater agreement such as the S-coefficient, Cohen's kappa, Scott's pi, Fleiss' kappa, Krippendorff's alpha, and Gwet's AC1. We show that these coefficients share the same set of assumptions about rater behavior; they only differ in how the unobserved category proportions are estimated. We incorporate Bayesian estimates of the category proportions and propose a new agreement coefficient with uniform prior beliefs. To correct for guessing in the process of item classification, the new coefficient emphasizes equal category probabilities if the observed frequencies are unstable due to a small sample, and the frequencies increasingly shape the coefficient as they become more stable. The proposed coefficient coincides with the S-coefficient for the hypothetical case of zero items; it converges to Scott's pi, Fleiss' kappa, and Krippendorff's alpha as the number of items increases. We use simulation to show that the proposed coefficient is as good as extant coefficients if the category proportions are equal and that it performs better if the category proportions are substantially unequal. (PsycINFO Database Record (c) 2019 APA, all rights reserved).
我们推导出一个通用的结构,其中包括重要的评分者间一致性系数,如 S 系数、Cohen's kappa、Scott's pi、Fleiss' kappa、Krippendorff's alpha 和 Gwet's AC1。我们表明,这些系数在评分者行为的假设方面具有相同的假设;它们仅在如何估计未观察到的类别比例方面有所不同。我们将类别比例的贝叶斯估计纳入其中,并提出了一个具有一致先验信念的新的一致性系数。为了纠正项目分类过程中的猜测,新系数在由于样本较小而导致观察到的频率不稳定时强调相等的类别概率,如果频率变得更加稳定,则频率会越来越多地影响系数。对于零项目的假设情况,新系数与 S 系数一致;随着项目数量的增加,它会收敛到 Scott's pi、Fleiss' kappa 和 Krippendorff's alpha。我们通过模拟表明,如果类别比例相等,新系数与现有系数一样好,如果类别比例差异很大,则新系数表现更好。