Shan Guogen, Wang Weizhen
1 Epidemiology and Biostatistics Program, Department of Environmental and Occupational Health, School of Community Health Sciences, University of Nevada Las Vegas, Las Vegas, USA.
2 College of Applied Sciences, Beijing University of Technology, Beijing, PR China.
Stat Methods Med Res. 2017 Apr;26(2):615-632. doi: 10.1177/0962280214552881. Epub 2014 Oct 6.
Cohen's kappa coefficient, κ, is a statistical measure of inter-rater agreement or inter-annotator agreement for qualitative items. In this paper, we focus on interval estimation of κ in the case of two raters and binary items. So far, only asymptotic and bootstrap intervals are available for κ due to its complexity. However, there is no guarantee that such intervals will capture κ with the desired nominal level 1- α. In other words, the statistical inferences based on these intervals are not reliable. We apply the Buehler method to obtain exact confidence intervals based on four widely used asymptotic intervals, three Wald-type confidence intervals and one interval constructed from a profile variance. These exact intervals are compared with regard to coverage probability and length for small to medium sample sizes. The exact intervals based on the Garner interval and the Lee and Tu interval are generally recommended for use in practice due to good performance in both coverage probability and length.
科恩kappa系数κ是用于定性项目的评分者间一致性或注释者间一致性的一种统计量度。在本文中,我们聚焦于两名评分者和二元项目情形下κ的区间估计。到目前为止,由于κ的复杂性,仅有渐近区间和自助法区间可用于κ。然而,无法保证此类区间能以期望的名义水平1-α包含κ。换句话说,基于这些区间的统计推断并不可靠。我们应用比勒方法,基于四个广泛使用的渐近区间、三个瓦尔德型置信区间以及一个由轮廓方差构建的区间来获得精确置信区间。针对中小样本量,对这些精确区间在覆盖概率和区间长度方面进行了比较。基于加纳区间以及李和涂区间的精确区间,因其在覆盖概率和区间长度方面均表现良好,通常建议在实际中使用。