Cancer Screening Evaluation Unit, Institute of Cancer Research, Sutton, Surrey, UK.
J Clin Pathol. 2011 Dec;64(12):1128-31. doi: 10.1136/jclinpath-2011-200229. Epub 2011 Aug 11.
A Urological Pathology External Quality Assurance (EQA) Scheme in the UK has reported observer variation in the diagnosis and grading of adenocarcinoma in prostatic biopsies using basic κ statistics, which rate all disagreements equally.
The aim of this study is to use customised weighting schemes to report κ statistics that reflect the closeness of interobserver agreement in the prostate EQA scheme.
A total of 83, 114 and 116 pathologists took part, respectively, in three web-based circulations and were classified as either expert or other readers. For analyses of diagnosis, there were 10, 8 and 8 cases in the three circulations, respectively. For analyses of Gleason Sum Score, only invasive cases were included, leaving 5, 5 and 6 cases, respectively. Analyses were conducted using customised weighting schemes with 'pairwise-weighted' κ for multiple readers.
Analysis of diagnosis for all circulations and all readers gave a composite κ value of 0.86 and pairwise-weighted κ (κ(p-w)) value of 0.91, both regarded as 'almost perfect' agreement. This was due to the high proportion of responses that showed partial agreement. Analysis of Gleason Sum Score gave κ=0.38 and κ(p-w)=0.58 over all circulations and all readers, indicating that discrepancies occur at the boundary between adjacent grades and may not be as clinically significant as suggested by composite κ.
Weighted κ show higher levels of agreement than previously reported as they have the advantage of applying weighting, which reflects the relative importance of different types of discordance in diagnosis or grading. Agreement on grading remained low.
英国泌尿科病理学外部质量保证(EQA)计划使用基本κ统计数据报告前列腺活检中腺癌诊断和分级的观察者差异,该统计数据平等对待所有分歧。
本研究旨在使用定制加权方案报告κ统计数据,反映前列腺 EQA 计划中观察者之间一致性的接近程度。
共有 83、114 和 116 位病理学家分别参加了三次基于网络的循环,并被分类为专家或其他读者。在这三个循环中,分别有 10、8 和 8 例用于诊断分析,分别有 5、5 和 6 例用于 Gleason 总分分析,仅包括浸润性病例。使用定制加权方案和“成对加权”κ进行多位读者的分析。
所有循环和所有读者的诊断分析得出综合κ值为 0.86,成对加权κ(κ(p-w))值为 0.91,两者均被认为是“几乎完美”的一致性。这是由于显示部分一致性的反应比例较高。在所有循环和所有读者中,Gleason 总分分析得出κ=0.38 和 κ(p-w)=0.58,表明在相邻等级之间的边界存在差异,并且可能不如综合κ所表明的那样具有临床意义。
加权κ显示出比以前报告更高的一致性水平,因为它们具有应用加权的优势,这反映了不同类型诊断或分级不一致的相对重要性。分级的一致性仍然较低。