Tan Kay See, Yeh Yi-Chen, Adusumilli Prasad S, Travis William D
Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York.
Department of Pathology and Laboratory Medicine, Taipei Veterans General Hospital, Taipei, Taiwan.
JTO Clin Res Rep. 2023 Dec 16;5(1):100618. doi: 10.1016/j.jtocrr.2023.100618. eCollection 2024 Jan.
Cohen's kappa is often used to quantify the agreement between two pathologists. Nevertheless, a high prevalence of the feature of interest can lead to seemingly paradoxical results, such as low Cohen's kappa values despite high "observed agreement." Here, we investigate Cohen's kappa using data from histologic subtyping assessment of lung adenocarcinomas and introduce alternative measures that can overcome this "kappa paradox."
A total of 50 frozen sections from stage I lung adenocarcinomas less than or equal to 3 cm in size were independently reviewed by two pathologists to determine the absence or presence of five histologic patterns (lepidic, papillary, acinar, micropapillary, solid). For each pattern, observed agreement (proportion of cases with concordant "absent" or "present" ratings) and Cohen's kappa were calculated, along with Gwet's AC1.
The prevalence of any amount of the histologic patterns ranged from 42% (solid) to 97% (acinar). On the basis of Cohen's kappa, there was substantial agreement for four of the five patterns (lepidic, 0.65; papillary, 0.67; micropapillary, 0.64; solid, 0.61). Acinar had the lowest Cohen's kappa (0.43, moderate agreement), despite having the highest observed agreement (88%). In contrast, Gwet's AC1 values were close to or higher than Cohen's kappa across patterns (lepidic, 0.64; papillary, 0.69; micropapillary, 0.71; solid, 0.73; acinar, 0.85). The proportion of positive versus negative agreement was 93% versus 50% for acinar.
Given the dependence of Cohen's kappa on feature prevalence, interrater agreement studies should include complementary indices such as Gwet's AC1 and proportions of specific agreement, especially in settings with a high prevalence of the feature of interest.
科恩kappa系数常用于量化两位病理学家之间的一致性。然而,感兴趣特征的高患病率可能导致看似矛盾的结果,例如尽管“观察到的一致性”很高,但科恩kappa值却很低。在此,我们使用肺腺癌组织学亚型评估的数据来研究科恩kappa系数,并引入可以克服这种“kappa悖论”的替代指标。
两位病理学家独立审查了总共50份来自大小小于或等于3 cm的I期肺腺癌的冰冻切片,以确定五种组织学模式(贴壁型、乳头型、腺泡型、微乳头型、实体型)的有无。对于每种模式,计算观察到的一致性(“不存在”或“存在”评级一致的病例比例)、科恩kappa系数以及格韦特AC1系数。
任何数量的组织学模式的患病率从42%(实体型)到97%(腺泡型)不等。基于科恩kappa系数,五种模式中的四种有实质性一致性(贴壁型,0.65;乳头型,0.67;微乳头型,0.64;实体型,0.61)。腺泡型的科恩kappa系数最低(0.43,中度一致性),尽管其观察到的一致性最高(88%)。相比之下,格韦特AC1系数值在各种模式中接近或高于科恩kappa系数(贴壁型,0.64;乳头型,0.69;微乳头型,0.71;实体型,0.73;腺泡型,0.85)。腺泡型的阳性与阴性一致性比例为93%对50%。
鉴于科恩kappa系数对特征患病率的依赖性,评估评分者间一致性的研究应包括补充指标,如格韦特AC1系数和特定一致性比例,尤其是在感兴趣特征患病率较高的情况下。