Barlow W, Lai M Y, Azen S P
Center for Health Studies, Group Health Cooperative, Seattle, WA 98101-1448.
Stat Med. 1991 Sep;10(9):1465-72. doi: 10.1002/sim.4780100913.
Investigators use the kappa coefficient to measure chance-corrected agreement among observers in the classification of subjects into nominal categories. The marginal probability of classification may depend, however, on one or more confounding variables. We consider assessment of interrater agreement with subjects grouped into strata on the basis of these confounders. We assume overall agreement across strata is constant and consider a stratified index of agreement, or 'stratified kappa', based on weighted summations of the individual kappas. We use three weighting schemes: (1) equal weighting; (2) weighting by the size of the table; and (3) weighting by the inverse of the variance. In a simulation study we compare these methods under differing probability structures and differing sample sizes for the tables. We find weighting by sample size moderately efficient under most conditions. We illustrate the techniques by assessing agreement between surgeons and graders of fundus photographs with respect to retinal characteristics, with stratification by initial severity of the disease.
研究人员使用kappa系数来衡量观察者在将受试者分类到名义类别时经机会校正后的一致性。然而,分类的边际概率可能取决于一个或多个混杂变量。我们考虑基于这些混杂因素将受试者分组到不同层次后评估评分者间的一致性。我们假设各层次间的总体一致性是恒定的,并基于各个kappa的加权总和考虑一个分层一致性指数,即“分层kappa”。我们使用三种加权方案:(1)等权重;(2)按表格大小加权;(3)按方差的倒数加权。在一项模拟研究中,我们在不同的概率结构和不同的表格样本量下比较了这些方法。我们发现,在大多数情况下,按样本量加权具有适度的效率。我们通过评估外科医生和眼底照片分级者在视网膜特征方面的一致性来说明这些技术,并按疾病的初始严重程度进行分层。