在可靠性研究中解释多个评分者之间的一致性时使用卡帕值的陷阱。

Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies.

作者信息

O'Leary Shaun, Lund Marte, Ytre-Hauge Tore Johan, Holm Sigrid Reiersen, Naess Kaja, Dalland Lars Nagelstad, McPhail Steven M

机构信息

NHMRC Centre for Clinical Research Excellence in Spinal Pain, Injury and Health, University of Queensland, Brisbane, QLD 4072, Australia; Physiotherapy Department, Royal Brisbane and Women's Hospital, Queensland Health, Herston, Brisbane, QLD 4029, Australia.

NHMRC Centre for Clinical Research Excellence in Spinal Pain, Injury and Health, University of Queensland, Brisbane, QLD 4072, Australia; Norwegian Sports Medicine Clinic (NIMI), Oslo, Norway.

出版信息

Physiotherapy. 2014 Mar;100(1):27-35. doi: 10.1016/j.physio.2013.08.002. Epub 2013 Nov 18.

DOI:10.1016/j.physio.2013.08.002

PMID:24262334

Abstract

OBJECTIVE

To compare different reliability coefficients (exact agreement, and variations of the kappa (generalised, Cohen's and Prevalence Adjusted and Biased Adjusted (PABAK))) for four physiotherapists conducting visual assessments of scapulae.

DESIGN

Inter-therapist reliability study.

SETTING

Research laboratory.

PARTICIPANTS

30 individuals with no history of neck or shoulder pain were recruited with no obvious significant postural abnormalities.

MAIN OUTCOME MEASURES

Ratings of scapular posture were recorded in multiple biomechanical planes under four test conditions (at rest, and while under three isometric conditions) by four physiotherapists.

RESULTS

The magnitude of discrepancy between the two therapist pairs was 0.04 to 0.76 for Cohen's kappa, and 0.00 to 0.86 for PABAK. In comparison, the generalised kappa provided a score between the two paired kappa coefficients. The difference between mean generalised kappa coefficients and mean Cohen's kappa (0.02) and between mean generalised kappa and PABAK (0.02) were negligible, but the magnitude of difference between the generalised kappa and paired kappa within each plane and condition was substantial; 0.02 to 0.57 for Cohen's kappa and 0.02 to 0.63 for PABAK, respectively.

CONCLUSIONS

Calculating coefficients for therapist pairs alone may result in inconsistent findings. In contrast, the generalised kappa provided a coefficient close to the mean of the paired kappa coefficients. These findings support an assertion that generalised kappa may lead to a better representation of reliability between three or more raters and that reliability studies only calculating agreement between two raters should be interpreted with caution. However, generalised kappa may mask more extreme cases of agreement (or disagreement) that paired comparisons may reveal.

摘要

目的

比较四位物理治疗师对肩胛骨进行视觉评估时不同的可靠性系数（完全一致性，以及卡帕系数的变体（广义卡帕、科恩卡帕和患病率调整及偏差调整卡帕（PABAK）））。

设计

治疗师间可靠性研究。

地点

研究实验室。

参与者

招募了30名无颈部或肩部疼痛病史且无明显显著姿势异常的个体。

主要观察指标

四位物理治疗师在四种测试条件下（静息状态，以及三种等长收缩条件下），在多个生物力学平面上记录肩胛骨姿势评分。

结果

对于科恩卡帕，两组治疗师之间的差异幅度为0.04至0.76，对于PABAK为0.00至0.86。相比之下，广义卡帕给出的分数介于两组配对卡帕系数之间。广义卡帕系数均值与科恩卡帕均值之间的差异（0.02）以及广义卡帕均值与PABAK均值之间的差异（0.02）可忽略不计，但在每个平面和条件下，广义卡帕与配对卡帕之间的差异幅度较大；科恩卡帕分别为0.02至0.57，PABAK为0.02至0.63。

结论

仅计算治疗师对之间的系数可能会导致结果不一致。相比之下，广义卡帕给出的系数接近配对卡帕系数的均值。这些发现支持这样一种观点，即广义卡帕可能会更好地反映三个或更多评估者之间的可靠性，并且仅计算两位评估者之间一致性的可靠性研究应谨慎解读。然而，广义卡帕可能会掩盖配对比较可能揭示的更极端的一致（或不一致）情况。

相似文献

Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies.

Physiotherapy. 2014 Mar;100(1):27-35. doi: 10.1016/j.physio.2013.08.002. Epub 2013 Nov 18.

Agreement between physiotherapists rating scapular posture in multiple planes in patients with neck pain: Reliability study.

Physiotherapy. 2015 Dec;101(4):381-8. doi: 10.1016/j.physio.2015.01.005. Epub 2015 Jan 25.

Interrater reliability of the modified scapular assistance test with and without handheld weights.

Man Ther. 2015 Dec;20(6):868-74. doi: 10.1016/j.math.2015.04.012. Epub 2015 Apr 16.

The inter- and intrarater reliability and agreement for field-based assessment of scapular control, shoulder range of motion, and shoulder isometric strength in elite adolescent athletes.

Phys Ther Sport. 2018 Jul;32:212-220. doi: 10.1016/j.ptsp.2018.04.005. Epub 2018 Apr 9.

Intertester reliability of a classification system for shoulder pain.

Physiotherapy. 2012 Mar;98(1):40-6. doi: 10.1016/j.physio.2010.12.003. Epub 2011 Mar 29.

[Quality criteria of assessment scales--Cohen's kappa as measure of interrator reliability (1)].

Pflege. 2004 Feb;17(1):36-46. doi: 10.1024/1012-5302.17.1.36.

Clinicians are right not to like Cohen's κ.

BMJ. 2013 Apr 12;346:f2125. doi: 10.1136/bmj.f2125.

Reliability of scapular positioning measurement procedure using the palpation meter (PALM).

Physiotherapy. 2010 Mar;96(1):59-67. doi: 10.1016/j.physio.2009.06.007. Epub 2009 Sep 2.

A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.

BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.

Reliability of classifications derived from Cyriax's resisted testing in subjects with painful shoulders and knees.

J Orthop Sports Phys Ther. 2003 May;33(5):235-46. doi: 10.2519/jospt.2003.33.5.235.

引用本文的文献

Intra- and inter-rater reliability of anterior and posterior drawer tests for the assessment of people with shoulder instability.

Clin Rehabil. 2025 Jul;39(7):914-922. doi: 10.1177/02692155251339380. Epub 2025 May 8.

Reliability of the McKenzie Method of Mechanical Diagnosis and Therapy in the examination of spinal pain, including the OTHER classifications: Reliability of the McKenzie Method in spinal pain.

Braz J Phys Ther. 2025 Jan-Feb;29(1):101154. doi: 10.1016/j.bjpt.2024.101154. Epub 2024 Dec 13.

The reliability of the scapular posture and scapular dyskinesis tests in rugby union players.

S Afr J Sports Med. 2021 Oct 4;33(1):v33i1a11674. doi: 10.17159/2078-516X/2021/v33i1a11674. eCollection 2021.

Is the quality of systematic reviews influenced by prospective registration: a methods review of systematic musculoskeletal physical therapy reviews.

J Man Manip Ther. 2023 Jun;31(3):184-197. doi: 10.1080/10669817.2022.2110419. Epub 2022 Aug 8.

Development and Evaluation of the Supportive Needs Assessment Tool for Cirrhosis (SNAC).

Patient Prefer Adherence. 2020 Mar 18;14:599-611. doi: 10.2147/PPA.S236818. eCollection 2020.

Inter-rater reliability of the McKenzie System of Mechanical Diagnosis and Therapy in the examination of the knee.

J Man Manip Ther. 2017 May;25(2):83-90. doi: 10.1080/10669817.2016.1229396. Epub 2016 Sep 7.

Development and validation of the Myasthenia Gravis Impairment Index.

Neurology. 2016 Aug 30;87(9):879-86. doi: 10.1212/WNL.0000000000002971. Epub 2016 Jul 8.

A Tool to Assess the Signs and Symptoms of Catheter-Associated Urinary Tract Infection: Development and Reliability.

Clin Nurs Res. 2015 Aug;24(4):341-56. doi: 10.1177/1054773814550506. Epub 2014 Sep 22.

Physiotherapists have accurate expectations of their patients' future health-related quality of life after first assessment in a subacute rehabilitation setting.

Biomed Res Int. 2013;2013:340371. doi: 10.1155/2013/340371. Epub 2013 Nov 20.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在可靠性研究中解释多个评分者之间的一致性时使用卡帕值的陷阱。

Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies.

作者信息

机构信息

出版信息

OBJECTIVE

DESIGN

SETTING

PARTICIPANTS

MAIN OUTCOME MEASURES

RESULTS

CONCLUSIONS

目的

设计

地点

参与者

主要观察指标

结果

结论

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献