Fleiss J L
Biometrics. 1975 Sep;31(3):651-9.
At least a dozen indexes have been proposed for measuring agreement between two judges on a categorical scale. Using the binary (positive-negative) case as a model, this paper presents and critically evaluates some of these proposed measures. The importance of correcting for chance-expected agreement is emphasized, and identities with intraclass correlation coefficients are pointed out.
为了衡量两位评判者在分类量表上的一致性,至少已经提出了十二种指标。本文以二元(阳性-阴性)情况为模型,介绍并严格评估了其中一些提出的测量方法。强调了校正机遇期望一致性的重要性,并指出了与组内相关系数的恒等关系。