使用猜测模型和知识系数来衡量一致性。

Measuring Agreement Using Guessing Models and Knowledge Coefficients.

机构信息

Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway.

出版信息

Psychometrika. 2023 Sep;88(3):1002-1025. doi: 10.1007/s11336-023-09919-4. Epub 2023 Jun 8.

DOI:10.1007/s11336-023-09919-4

PMID:37291419

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10444669/

Abstract

Several measures of agreement, such as the Perreault-Leigh coefficient, the [Formula: see text], and the recent coefficient of van Oest, are based on explicit models of how judges make their ratings. To handle such measures of agreement under a common umbrella, we propose a class of models called guessing models, which contains most models of how judges make their ratings. Every guessing model have an associated measure of agreement we call the knowledge coefficient. Under certain assumptions on the guessing models, the knowledge coefficient will be equal to the multi-rater Cohen's kappa, Fleiss' kappa, the Brennan-Prediger coefficient, or other less-established measures of agreement. We provide several sample estimators of the knowledge coefficient, valid under varying assumptions, and their asymptotic distributions. After a sensitivity analysis and a simulation study of confidence intervals, we find that the Brennan-Prediger coefficient typically outperforms the others, with much better coverage under unfavorable circumstances.

摘要

几种一致性度量，如 Perreault-Leigh 系数、[公式：见文本]和最近的 van Oest 系数，都是基于法官如何进行评分的明确模型。为了在一个通用框架下处理这些一致性度量，我们提出了一类称为猜测模型的模型，该模型包含了大多数法官如何进行评分的模型。每个猜测模型都有一个关联的一致性度量，我们称之为知识系数。在猜测模型的某些假设下，知识系数将等于多评分者 Cohen's kappa、Fleiss' kappa、Brennan-Prediger 系数或其他不太成熟的一致性度量。我们提供了几个知识系数的样本估计量，它们在不同的假设下是有效的，并且它们的渐近分布。经过敏感性分析和置信区间的模拟研究，我们发现 Brennan-Prediger 系数通常表现优于其他系数，在不利情况下具有更好的覆盖范围。

相似文献

Measuring Agreement Using Guessing Models and Knowledge Coefficients.

Psychometrika. 2023 Sep;88(3):1002-1025. doi: 10.1007/s11336-023-09919-4. Epub 2023 Jun 8.

A new coefficient of interrater agreement: The challenge of highly unequal category proportions.

Psychol Methods. 2019 Aug;24(4):439-451. doi: 10.1037/met0000183. Epub 2018 May 3.

Weighting schemes and incomplete data: A generalized Bayesian framework for chance-corrected interrater agreement.

Psychol Methods. 2022 Dec;27(6):1069-1088. doi: 10.1037/met0000412. Epub 2021 Nov 11.

A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.

BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.

Homogeneity score test of AC statistics and estimation of common AC in multiple or stratified inter-rater agreement studies.

BMC Med Res Methodol. 2020 Feb 5;20(1):20. doi: 10.1186/s12874-019-0887-5.

Interrater reliability estimators tested against true interrater reliabilities.

BMC Med Res Methodol. 2022 Aug 29;22(1):232. doi: 10.1186/s12874-022-01707-5.

Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate?

BMC Med Res Methodol. 2016 Aug 5;16:93. doi: 10.1186/s12874-016-0200-9.

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference.

Psychometrika. 2024 Jun;89(2):517-541. doi: 10.1007/s11336-023-09945-2. Epub 2024 Jan 8.

The prediction of pouch of Douglas obliteration using offline analysis of the transvaginal ultrasound 'sliding sign' technique: inter- and intra-observer reproducibility.

Hum Reprod. 2013 May;28(5):1237-46. doi: 10.1093/humrep/det044. Epub 2013 Mar 12.

[Interrater reliability of the Braden scale].

Pflege. 2008 Apr;21(2):85-94. doi: 10.1024/1012-5302.21.2.85.

引用本文的文献

A comprehensive guide to study the agreement and reliability of multi-observer ordinal data.

BMC Med Res Methodol. 2024 Dec 20;24(1):310. doi: 10.1186/s12874-024-02431-y.

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference.

Psychometrika. 2024 Jun;89(2):517-541. doi: 10.1007/s11336-023-09945-2. Epub 2024 Jan 8.

本文引用的文献

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference.

Psychometrika. 2024 Jun;89(2):517-541. doi: 10.1007/s11336-023-09945-2. Epub 2024 Jan 8.

Weighting schemes and incomplete data: A generalized Bayesian framework for chance-corrected interrater agreement.

Psychol Methods. 2022 Dec;27(6):1069-1088. doi: 10.1037/met0000412. Epub 2021 Nov 11.

Large-Sample Variance of Fleiss Generalized Kappa.

Educ Psychol Meas. 2021 Aug;81(4):781-790. doi: 10.1177/0013164420973080. Epub 2021 Feb 15.

A new coefficient of interrater agreement: The challenge of highly unequal category proportions.

Psychol Methods. 2019 Aug;24(4):439-451. doi: 10.1037/met0000183. Epub 2018 May 3.

Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate?

BMC Med Res Methodol. 2016 Aug 5;16:93. doi: 10.1186/s12874-016-0200-9.

Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.

Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.

Computing inter-rater reliability and its variance in the presence of high agreement.

Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48. doi: 10.1348/000711006X126600.

Reliability studies of psychiatric diagnosis. Theory and practice.

Arch Gen Psychiatry. 1981 Apr;38(4):408-13. doi: 10.1001/archpsyc.1981.01780290042004.

Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen's kappa.

Biometrics. 1990 Jun;46(2):293-302.

Coefficients of agreement between observers and their interpretation.

Br J Psychiatry. 1977 Jan;130:79-83. doi: 10.1192/bjp.130.1.79.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用猜测模型和知识系数来衡量一致性。

Measuring Agreement Using Guessing Models and Knowledge Coefficients.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献