高一致性与高患病率：科恩kappa系数的悖论

High Agreement and High Prevalence: The Paradox of Cohen's Kappa.

作者信息

Zec Slavica, Soriani Nicola, Comoretto Rosanna, Baldi Ileana

机构信息

Department of Cardiac, Thoracic and Vascular Sciences, Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Padova, Italy.

Department of Statistics and quantitative methods, University of Milan, Bicocca, Italy.

出版信息

Open Nurs J. 2017 Oct 31;11:211-218. doi: 10.2174/1874434601711010211. eCollection 2017.

DOI:10.2174/1874434601711010211

PMID:29238424

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5712640/

Abstract

BACKGROUND

Cohen's Kappa is the most used agreement statistic in literature. However, under certain conditions, it is affected by a paradox which returns biased estimates of the statistic itself.

OBJECTIVE

The aim of the study is to provide sufficient information which allows the reader to make an informed choice of the correct agreement measure, by underlining some optimal properties of Gwet's AC1 in comparison to Cohen's Kappa, using a real data example.

METHOD

During the process of literature review, we have asked a panel of three evaluators to come up with a judgment on the quality of 57 randomized controlled trials assigning a score to each trial using the Jadad scale. The quality was evaluated according to the following dimensions: adopted design, randomization unit, type of primary endpoint. With respect to each of the above described features, the agreement between the three evaluators has been calculated using Cohen's Kappa statistic and Gwet's AC1 statistic and, finally, the values have been compared with the observed agreement.

RESULTS

The values of the Cohen's Kappa statistic would lead to believe that the agreement levels for the variables Unit, Design and Primary Endpoints are totally unsatisfactory. The AC1 statistic, on the contrary, shows plausible values which are in line with the respective values of the observed concordance.

CONCLUSION

We conclude that it would always be appropriate to adopt the AC1 statistic, thus bypassing any risk of incurring the paradox and drawing wrong conclusions about the results of agreement analysis.

摘要

背景

科恩卡方系数是文献中最常用的一致性统计量。然而，在某些情况下，它会受到一种悖论的影响，导致该统计量本身的估计值存在偏差。

目的

本研究旨在通过使用一个实际数据示例，强调格韦特AC1相对于科恩卡方系数的一些最优属性，从而为读者提供足够的信息，使其能够明智地选择正确的一致性测量方法。

方法

在文献综述过程中，我们邀请了一个由三名评估者组成的小组，对57项随机对照试验的质量进行判断，并使用雅达量表为每项试验打分。根据以下维度评估质量：采用的设计、随机化单位、主要终点类型。对于上述每个特征，使用科恩卡方统计量和格韦特AC1统计量计算三名评估者之间的一致性，最后将这些值与观察到的一致性进行比较。

结果

科恩卡方统计量的值会让人认为变量“单位”“设计”和“主要终点”的一致性水平完全不令人满意。相反，AC1统计量显示出合理的值，与观察到的一致性的各自值相符。

结论

我们得出结论，采用AC1统计量总是合适的，从而避免出现悖论的风险，并避免对一致性分析结果得出错误结论。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9288/5712640/1b83f3e5015d/TONURSJ-11-211_F1.jpg

相似文献

High Agreement and High Prevalence: The Paradox of Cohen's Kappa.

Open Nurs J. 2017 Oct 31;11:211-218. doi: 10.2174/1874434601711010211. eCollection 2017.

A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.

BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.

Gwet's AC1 is not a substitute for Cohen's kappa - A comparison of basic properties.

MethodsX. 2023 May 10;10:102212. doi: 10.1016/j.mex.2023.102212. eCollection 2023.

Influence of true within-herd prevalence of small ruminant lentivirus infection in goats on agreement between serological immunoenzymatic tests.

Prev Vet Med. 2017 Sep 1;144:75-80. doi: 10.1016/j.prevetmed.2017.05.017. Epub 2017 May 30.

Quantifying Interrater Agreement and Reliability Between Thoracic Pathologists: Paradoxical Behavior of Cohen's Kappa in the Presence of a High Prevalence of the Histopathologic Feature in Lung Cancer.

JTO Clin Res Rep. 2023 Dec 16;5(1):100618. doi: 10.1016/j.jtocrr.2023.100618. eCollection 2024 Jan.

Inter-observer agreement between two observers for bovine digital dermatitis identification in New Zealand using digital photographs.

N Z Vet J. 2019 May;67(3):143-147. doi: 10.1080/00480169.2019.1582369. Epub 2019 Mar 7.

Assessing Binary Diagnoses of Bio-behavioral Disorders: The Clinical Relevance of Cohen's Kappa.

J Nerv Ment Dis. 2017 Jan;205(1):58-65. doi: 10.1097/NMD.0000000000000598.

Homogeneity score test of AC statistics and estimation of common AC in multiple or stratified inter-rater agreement studies.

BMC Med Res Methodol. 2020 Feb 5;20(1):20. doi: 10.1186/s12874-019-0887-5.

Measures of diagnostic precision (repeatability and reproducibility) for three test methods designed to detect spring viremia of carp virus.

Prev Vet Med. 2021 Mar;188:105288. doi: 10.1016/j.prevetmed.2021.105288. Epub 2021 Jan 28.

Evaluation of inter-rater agreement of the clinical signs used to diagnose bovine respiratory disease in individually housed veal calves.

J Dairy Sci. 2021 Nov;104(11):12053-12065. doi: 10.3168/jds.2021-20503. Epub 2021 Aug 26.

引用本文的文献

Bisacodyl micro-enema before MRI of rectal tumors: effects on rectum, image quality and patient acceptance.

Eur Radiol. 2025 Sep 17. doi: 10.1007/s00330-025-11996-1.

Association between flat variants of the peroneus brevis tendon and split tears on magnetic resonance imaging.

Skeletal Radiol. 2025 Sep 13. doi: 10.1007/s00256-025-05032-y.

A Systematic Review of Available Normative Data on Neuropsychological Tests for Spanish Speakers in the U.S., Latin America and the Caribbean, and Spain.

Neuropsychol Rev. 2025 Jul 7. doi: 10.1007/s11065-025-09666-6.

From the -Factor to Cognitive Content: Detection and Discrimination of Psychopathologies Based on Explainable Artificial Intelligence.

Depress Anxiety. 2025 May 19;2025:9943590. doi: 10.1155/da/9943590. eCollection 2025.

The Creation of a Systematic Framework to Assess Dog Laws and Their Relationship to Societal Changes in the United Kingdom.

Animals (Basel). 2025 Feb 23;15(5):647. doi: 10.3390/ani15050647.

Social Media Posts About Medical Tests With Potential for Overdiagnosis.

JAMA Netw Open. 2025 Feb 3;8(2):e2461940. doi: 10.1001/jamanetworkopen.2024.61940.

Feasibility of a Standardised Mid-Trimester Ultrasound Protocol: A National Multicenter Study.

BJOG. 2025 Jul;132(8):1065-1073. doi: 10.1111/1471-0528.18102. Epub 2025 Feb 13.

Quantifying intratumoral biomarker heterogeneity in tubo-ovarian high-grade serous carcinoma to optimize clinical translation.

Sci Rep. 2025 Jan 20;15(1):2459. doi: 10.1038/s41598-024-82206-z.

Neither Bone Marrow Aspirate Concentrate nor Platelet-Rich Plasma Improves Patient-Reported Outcomes After Surgical Management of Acetabular Labral Tears; However, Bone Marrow Aspirate Concentrate May Be Effective for Moderate Cartilage Damage: A Systematic Review.

Arthrosc Sports Med Rehabil. 2024 Aug 28;6(6):100991. doi: 10.1016/j.asmr.2024.100991. eCollection 2024 Dec.

Development of the Italian version of the Motricity Index and evaluation of its reliability in adults with stroke.

J Rehabil Med. 2025 Jan 3;57:jrm40441. doi: 10.2340/jrm.v57.40441.

本文引用的文献

Research in Nursing and Nutrition: Is Randomized Clinical Trial the Actual Gold Standard?

Gastroenterol Nurs. 2017 Jan/Feb;40(1):63-70. doi: 10.1097/SGA.0000000000000246.

Imaging non-specific wrist pain: interobserver agreement and diagnostic accuracy of SPECT/CT, MRI, CT, bone scan and plain radiographs.

PLoS One. 2013 Dec 30;8(12):e85359. doi: 10.1371/journal.pone.0085359. eCollection 2013.

Development of a consensus algorithm to improve interobserver agreement and accuracy in the determination of tricuspid regurgitation severity.

J Am Soc Echocardiogr. 2014 Mar;27(3):277-84. doi: 10.1016/j.echo.2013.11.016. Epub 2013 Dec 25.

Inter-observer agreement on the diagnosis of neurocardiogenic injury following aneurysmal subarachnoid hemorrhage.

Neurocrit Care. 2014 Apr;20(2):263-9. doi: 10.1007/s12028-013-9941-z.

Spondyloarthritis-related and degenerative MRI changes in the axial skeleton--an inter- and intra-observer agreement study.

BMC Musculoskelet Disord. 2013 Sep 23;14:274. doi: 10.1186/1471-2474-14-274.

Intraclass correlations: uses in assessing rater reliability.

Psychol Bull. 1979 Mar;86(2):420-8. doi: 10.1037//0033-2909.86.2.420.

Computing inter-rater reliability and its variance in the presence of high agreement.

Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48. doi: 10.1348/000711006X126600.

Scales to assess the quality of randomized controlled trials: a systematic review.

Phys Ther. 2008 Feb;88(2):156-75. doi: 10.2522/ptj.20070147. Epub 2007 Dec 11.

Systematic reviews in health care: Assessing the quality of controlled clinical trials.

BMJ. 2001 Jul 7;323(7303):42-6. doi: 10.1136/bmj.323.7303.42.

The CONSORT statement: revised recommendations for improving the quality of reports of parallel group randomized trials.

BMC Med Res Methodol. 2001;1:2. doi: 10.1186/1471-2288-1-2. Epub 2001 Apr 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

高一致性与高患病率：科恩kappa系数的悖论

High Agreement and High Prevalence: The Paradox of Cohen's Kappa.

作者信息

Zec Slavica, Soriani Nicola, Comoretto Rosanna, Baldi Ileana

机构信息

Department of Cardiac, Thoracic and Vascular Sciences, Unit of Biostatistics, Epidemiology and Public Health, University of Padova, Padova, Italy.

Department of Statistics and quantitative methods, University of Milan, Bicocca, Italy.

出版信息

Open Nurs J. 2017 Oct 31;11:211-218. doi: 10.2174/1874434601711010211. eCollection 2017.

DOI:10.2174/1874434601711010211

PMID:29238424

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5712640/

Abstract

BACKGROUND

Cohen's Kappa is the most used agreement statistic in literature. However, under certain conditions, it is affected by a paradox which returns biased estimates of the statistic itself.

OBJECTIVE

METHOD

RESULTS

CONCLUSION

We conclude that it would always be appropriate to adopt the AC1 statistic, thus bypassing any risk of incurring the paradox and drawing wrong conclusions about the results of agreement analysis.

摘要

背景

科恩卡方系数是文献中最常用的一致性统计量。然而，在某些情况下，它会受到一种悖论的影响，导致该统计量本身的估计值存在偏差。

目的

方法

结果

结论

我们得出结论，采用AC1统计量总是合适的，从而避免出现悖论的风险，并避免对一致性分析结果得出错误结论。

高一致性与高患病率：科恩kappa系数的悖论

High Agreement and High Prevalence: The Paradox of Cohen's Kappa.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHOD

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

高一致性与高患病率：科恩kappa系数的悖论

High Agreement and High Prevalence: The Paradox of Cohen's Kappa.

作者信息

机构信息

出版信息

BACKGROUND

OBJECTIVE

METHOD

RESULTS

CONCLUSION

背景

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献