Cousineau Denis, Laurencelle Louis
Université d'Ottawa, Ottawa, Ontario, Canada.
Université du Québec à Trois-Rivières, Trois-Rivières, Quebec, Canada.
Educ Psychol Meas. 2015 Dec;75(6):979-1001. doi: 10.1177/0013164415574086. Epub 2015 Mar 25.
Existing tests of interrater agreements have high statistical power; however, they lack specificity. If the ratings of the two raters do not show agreement but are not random, the current tests, some of which are based on Cohen's kappa, will often reject the null hypothesis, leading to the wrong conclusion that agreement is present. A new test of interrater agreement, applicable to nominal or ordinal categories, is presented. The test statistic can be expressed as a ratio (labeled , ranging from 0 to infinity) or as a proportion (labeled , ranging from 0 to 1). This test weighs information supporting agreement with information supporting disagreement. This new test's effectiveness (power and specificity) is compared with five other tests of interrater agreement in a series of Monte Carlo simulations. The new test, although slightly less powerful than the other tests reviewed, is the only one sensitive to agreement only. We also introduce confidence intervals on the proportion of agreement.
现有的评分者间一致性检验具有较高的统计功效;然而,它们缺乏特异性。如果两位评分者的评分未显示出一致性,但并非随机的,那么当前的检验方法(其中一些基于科恩kappa系数)往往会拒绝原假设,从而得出存在一致性的错误结论。本文提出了一种适用于名义或有序类别数据的评分者间一致性新检验方法。该检验统计量可以表示为一个比率(标记为 ,范围从0到无穷大)或一个比例(标记为 ,范围从0到1)。此检验权衡了支持一致性的信息和支持不一致性的信息。在一系列蒙特卡洛模拟中,将这种新检验的有效性(功效和特异性)与其他五种评分者间一致性检验进行了比较。这种新检验虽然在功效上略低于所评估的其他检验,但却是唯一仅对一致性敏感的检验。我们还引入了一致性比例的置信区间。