• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

多评分者一致性的度量:Fréchet 方差和推断。

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference.

机构信息

Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway.

出版信息

Psychometrika. 2024 Jun;89(2):517-541. doi: 10.1007/s11336-023-09945-2. Epub 2024 Jan 8.

DOI:10.1007/s11336-023-09945-2
PMID:38190018
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11164747/
Abstract

Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen's kappa or Fleiss's kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss's kappa, Conger's kappa, and Hubert's kappa, the variant of Fleiss's kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.

摘要

大多数一致性度量都经过了机会校正。它们在三个维度上有所不同:它们对机会一致性的定义、不一致函数的选择以及如何处理多个评分者。机会一致性通常以成对的方式定义,遵循 Cohen 的 kappa 或 Fleiss 的 kappa。不一致函数通常是名义的、二次的或绝对值函数。但是,如何处理多个评分者存在争议,主要竞争者是 Fleiss 的 kappa、Conger 的 kappa 和 Hubert 的 kappa,后者是 Fleiss 的 kappa 的变体,只有当每个评分者都同意时才认为存在一致性。更一般地,多评分者一致性系数可以以 g 为单位进行定义,其中不一致权重函数使用 g 个评分者而不是两个评分者。本文有两个主要贡献。(a)我们提出使用 Fréchet 方差来处理多个评分者的情况。Fréchet 方差是直观的不一致度量,并且可以将名义、二次和绝对值函数推广到超过两个评分者的情况。(b)我们推导出了具有 Cohen 型或 Fleiss 型机会一致性的 g 加权一致性系数的极限理论,对于每个项目都由相同数量的评分者进行评分的情况。通过尝试三种置信区间构造方法,我们最终建议使用反正弦变换或 Fisher 变换计算置信区间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda4/11164747/71ab88019582/11336_2023_9945_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda4/11164747/9d9977687e3d/11336_2023_9945_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda4/11164747/71ab88019582/11336_2023_9945_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda4/11164747/9d9977687e3d/11336_2023_9945_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fda4/11164747/71ab88019582/11336_2023_9945_Fig2_HTML.jpg

相似文献

1
Measures of Agreement with Multiple Raters: Fréchet Variances and Inference.多评分者一致性的度量:Fréchet 方差和推断。
Psychometrika. 2024 Jun;89(2):517-541. doi: 10.1007/s11336-023-09945-2. Epub 2024 Jan 8.
2
Hubert's multi-rater kappa revisited.再探休伯尔氏多评估者 κ 系数。
Br J Math Stat Psychol. 2020 Feb;73(1):1-22. doi: 10.1111/bmsp.12167. Epub 2019 May 6.
3
A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.科恩氏 κ系数与格瓦特氏 AC1 系数在计算评定者间信度系数时的比较:一项对人格障碍样本进行的研究。
BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.
4
Simulating and estimating agreement in the presence of multiple raters and covariates.模拟和估计存在多个评分者和协变量时的一致性。
Stat Med. 2023 May 20;42(11):1687-1698. doi: 10.1002/sim.9694. Epub 2023 Mar 5.
5
Homogeneity score test of AC statistics and estimation of common AC in multiple or stratified inter-rater agreement studies.多或分层组内一致性研究中 AC 统计量的同质性检验和共同 AC 的估计。
BMC Med Res Methodol. 2020 Feb 5;20(1):20. doi: 10.1186/s12874-019-0887-5.
6
Degenerative findings in lumbar spine MRI: an inter-rater reliability study involving three raters.腰椎磁共振成像中的退行性病变:涉及 3 名评估者的观察者间可靠性研究。
Chiropr Man Therap. 2020 Feb 11;28(1):8. doi: 10.1186/s12998-020-0297-0.
7
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance.检验相关一致性系数差异的统计学显著性。
Educ Psychol Meas. 2016 Aug;76(4):609-637. doi: 10.1177/0013164415596420. Epub 2015 Jul 28.
8
Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies.在可靠性研究中解释多个评分者之间的一致性时使用卡帕值的陷阱。
Physiotherapy. 2014 Mar;100(1):27-35. doi: 10.1016/j.physio.2013.08.002. Epub 2013 Nov 18.
9
Overall indices for assessing agreement among multiple raters.评估多个评分者一致性的综合指标。
Stat Med. 2018 Dec 10;37(28):4200-4215. doi: 10.1002/sim.7912. Epub 2018 Jul 30.
10
Weighting schemes and incomplete data: A generalized Bayesian framework for chance-corrected interrater agreement.加权方案与不完整数据:用于机会校正的评分者间一致性的广义贝叶斯框架。
Psychol Methods. 2022 Dec;27(6):1069-1088. doi: 10.1037/met0000412. Epub 2021 Nov 11.

引用本文的文献

1
A comprehensive guide to study the agreement and reliability of multi-observer ordinal data.一份研究多观察者有序数据一致性和可靠性的综合指南。
BMC Med Res Methodol. 2024 Dec 20;24(1):310. doi: 10.1186/s12874-024-02431-y.
2
Measuring Agreement Using Guessing Models and Knowledge Coefficients.使用猜测模型和知识系数来衡量一致性。
Psychometrika. 2023 Sep;88(3):1002-1025. doi: 10.1007/s11336-023-09919-4. Epub 2023 Jun 8.

本文引用的文献

1
Measuring Agreement Using Guessing Models and Knowledge Coefficients.使用猜测模型和知识系数来衡量一致性。
Psychometrika. 2023 Sep;88(3):1002-1025. doi: 10.1007/s11336-023-09919-4. Epub 2023 Jun 8.
2
Hubert's multi-rater kappa revisited.再探休伯尔氏多评估者 κ 系数。
Br J Math Stat Psychol. 2020 Feb;73(1):1-22. doi: 10.1111/bmsp.12167. Epub 2019 May 6.
3
A new coefficient of interrater agreement: The challenge of highly unequal category proportions.一种新的评分者间一致性系数:高度不均衡类目比例的挑战。
Psychol Methods. 2019 Aug;24(4):439-451. doi: 10.1037/met0000183. Epub 2018 May 3.
4
Measuring inter-rater reliability for nominal data - which coefficients and confidence intervals are appropriate?测量名义数据的评分者间信度——哪些系数和置信区间是合适的?
BMC Med Res Methodol. 2016 Aug 5;16:93. doi: 10.1186/s12874-016-0200-9.
5
The arcsine is asinine: the analysis of proportions in ecology.反正弦法很愚蠢:生态学中的比例分析。
Ecology. 2011 Jan;92(1):3-10. doi: 10.1890/10-0340.1.
6
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.加权kappa系数:用于衡量名义尺度上的一致性,并考虑了尺度不一致或部分得分的情况。
Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.
7
Weighted kappa for multiple raters.多位评分者的加权kappa系数。
Percept Mot Skills. 2008 Dec;107(3):837-48. doi: 10.2466/pms.107.3.837-848.
8
Computing inter-rater reliability and its variance in the presence of high agreement.在高度一致的情况下计算评分者间信度及其方差。
Br J Math Stat Psychol. 2008 May;61(Pt 1):29-48. doi: 10.1348/000711006X126600.
9
Estimating the generalized concordance correlation coefficient through variance components.通过方差分量估计广义一致性相关系数。
Biometrics. 2003 Dec;59(4):849-58. doi: 10.1111/j.0006-341x.2003.00099.x.
10
Psychiatric diagnosis: a comparative study in North Carolina, London and Glasgow.精神科诊断:在北卡罗来纳州、伦敦和格拉斯哥的一项比较研究。
Br J Psychiatry. 1968 Jan;114(506):1-9. doi: 10.1192/bjp.114.506.1.