• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

灰区对有序表一致性度量准确性的影响。

The impact of grey zones on the accuracy of agreement measures for ordinal tables.

机构信息

An Giang University, VNU-HCM, Long Xuyen City, An Giang Province, 076, Vietnam.

Mathematical Sciences, School of Science, RMIT University, Melbourne, Victoria, 3000, Australia.

出版信息

BMC Med Res Methodol. 2021 Apr 14;21(1):70. doi: 10.1186/s12874-021-01248-3.

DOI:10.1186/s12874-021-01248-3
PMID:33853549
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8048180/
Abstract

BACKGROUND

In an inter-rater agreement study, if two raters tend to rate considering different aspects of the subject of interest or have different experience levels, a grey zone occurs among the levels of a square contingency table showing the inter-rater agreement. These grey zones distort the degree of agreement between raters and negatively impact the decisions based on the inter-rater agreement tables. In this sense, it is important to know how the existence of a grey zone impacts the inter-rater agreement coefficients to choose the most reliable agreement coefficient against the grey zones to reach out with more reliable decisions.

METHODS

In this article, we propose two approaches to create grey zones in simulations setting and conduct an extensive Monte Carlo simulation study to figure out the impact of having grey zones on the weighted inter-rater agreement measures for ordinal tables over a comprehensive simulation space.

RESULTS

The weighted inter-rater agreement coefficients are not reliable against the existence of grey zones. Increasing sample size and the number of categories in the agreement table decreases the accuracy of weighted inter-rater agreement measures when there is a grey zone. When the degree of agreement between the raters is high, the agreement measures are not significantly impacted by the existence of grey zones. However, if there is a medium to low degree of inter-rater agreement, all the weighted coefficients are more or less impacted.

CONCLUSIONS

It is observed in this study that the existence of grey zones has a significant negative impact on the accuracy of agreement measures especially for a low degree of true agreement and high sample and tables sizes. In general, Gwet's AC2 and Brennan-Prediger's κ with quadratic or ordinal weights are reliable against the grey zones.

摘要

背景

在评分者间一致性研究中,如果两位评分者倾向于考虑感兴趣的主题的不同方面,或者具有不同的经验水平,那么在显示评分者间一致性的方形列联表的水平中会出现灰色区域。这些灰色区域会扭曲评分者之间的一致性程度,并对基于评分者间一致性表的决策产生负面影响。从这个意义上说,了解灰色区域的存在如何影响评分者间一致性系数对于选择最可靠的一致性系数以应对灰色区域,从而做出更可靠的决策是很重要的。

方法

在本文中,我们提出了两种在模拟环境中创建灰色区域的方法,并进行了广泛的蒙特卡罗模拟研究,以确定灰色区域对有序表的加权评分者间一致性度量在全面的模拟空间中的影响。

结果

加权评分者间一致性系数在存在灰色区域的情况下不可靠。当存在灰色区域时,增加样本量和一致性表中的类别数量会降低加权评分者间一致性度量的准确性。当评分者之间的一致性程度较高时,一致性度量不会受到灰色区域存在的显著影响。然而,如果评分者之间的一致性程度中等偏低,则所有加权系数或多或少都会受到影响。

结论

本研究观察到,灰色区域的存在对一致性度量的准确性有显著的负面影响,尤其是在真实一致性程度较低、样本量和表量大的情况下。一般来说,Gwet 的 AC2 和 Brennan-Prediger 的 κ 与二次或有序权重在应对灰色区域时是可靠的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/0923cc385fc6/12874_2021_1248_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/e88aaa87ce3a/12874_2021_1248_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/91821f6d0cc9/12874_2021_1248_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/1dcca8b61556/12874_2021_1248_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/70100a6cf0d6/12874_2021_1248_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/e9adfc6a0870/12874_2021_1248_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/797790096323/12874_2021_1248_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/dbad643ad07e/12874_2021_1248_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/0923cc385fc6/12874_2021_1248_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/e88aaa87ce3a/12874_2021_1248_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/91821f6d0cc9/12874_2021_1248_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/1dcca8b61556/12874_2021_1248_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/70100a6cf0d6/12874_2021_1248_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/e9adfc6a0870/12874_2021_1248_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/797790096323/12874_2021_1248_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/dbad643ad07e/12874_2021_1248_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6fd3/8048180/0923cc385fc6/12874_2021_1248_Fig8_HTML.jpg

相似文献

1
The impact of grey zones on the accuracy of agreement measures for ordinal tables.灰区对有序表一致性度量准确性的影响。
BMC Med Res Methodol. 2021 Apr 14;21(1):70. doi: 10.1186/s12874-021-01248-3.
2
Bayesian approaches to the weighted kappa-like inter-rater agreement measures.贝叶斯方法在加权κ相似的评分者间一致性测量中的应用。
Stat Methods Med Res. 2021 Oct;30(10):2329-2351. doi: 10.1177/09622802211037068. Epub 2021 Aug 27.
3
Detection of grey zones in inter-rater agreement studies.检测评分者间一致性研究中的灰色地带。
BMC Med Res Methodol. 2023 Jan 5;23(1):3. doi: 10.1186/s12874-022-01759-7.
4
Robustness of -type coefficients for clinical agreement.- 型系数用于临床一致性的稳健性。
Stat Med. 2022 May 20;41(11):1986-2004. doi: 10.1002/sim.9341. Epub 2022 Feb 6.
5
Homogeneity score test of AC statistics and estimation of common AC in multiple or stratified inter-rater agreement studies.多或分层组内一致性研究中 AC 统计量的同质性检验和共同 AC 的估计。
BMC Med Res Methodol. 2020 Feb 5;20(1):20. doi: 10.1186/s12874-019-0887-5.
6
A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.科恩氏 κ系数与格瓦特氏 AC1 系数在计算评定者间信度系数时的比较:一项对人格障碍样本进行的研究。
BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.
7
Approaches to describing inter-rater reliability of the overall clinical appearance of febrile infants and toddlers in the emergency department.描述急诊科发热婴幼儿总体临床表现的评分者间信度的方法。
PeerJ. 2014 Nov 11;2:e651. doi: 10.7717/peerj.651. eCollection 2014.
8
Appropriate Statistics for Determining Chance-Removed Interpractitioner Agreement.确定机会消除后从业者间一致性的适当统计方法。
J Altern Complement Med. 2019 Nov;25(11):1115-1120. doi: 10.1089/acm.2017.0297. Epub 2018 May 31.
9
Non-contrast CT markers of intracerebral hematoma expansion: a reliability study.非对比 CT 标志物对颅内血肿扩大的评估:一项可靠性研究。
Eur Radiol. 2022 Sep;32(9):6126-6135. doi: 10.1007/s00330-022-08710-w. Epub 2022 Mar 29.
10
Inter-rater agreement on assessment of outcome within a trauma registry.创伤登记系统中对结果评估的评分者间一致性。
Injury. 2016 Jan;47(1):130-4. doi: 10.1016/j.injury.2015.08.002. Epub 2015 Aug 10.

引用本文的文献

1
Detection of grey zones in inter-rater agreement studies.检测评分者间一致性研究中的灰色地带。
BMC Med Res Methodol. 2023 Jan 5;23(1):3. doi: 10.1186/s12874-022-01759-7.

本文引用的文献

1
Grading variation in 2,934 patients with ductal carcinoma in situ of the breast: the effect of laboratory- and pathologist-specific feedback reports.2934 例乳腺导管原位癌患者分级差异:实验室和病理学家特定反馈报告的影响。
Diagn Pathol. 2020 May 11;15(1):52. doi: 10.1186/s13000-020-00970-8.
2
Comparison of the Diagnostic Accuracy of Plasma N-Terminal Pro-Brain Natriuretic Peptide in Patients <80 to those >80 Years of Age with Heart Failure.血浆 N 端脑利钠肽前体在年龄<80 岁和>80 岁心力衰竭患者中的诊断准确性比较。
Am J Cardiol. 2018 Dec 15;122(12):2075-2079. doi: 10.1016/j.amjcard.2018.09.004. Epub 2018 Sep 13.
3
Inter-rater reliability of AMSTAR is dependent on the pair of reviewers.
AMSTAR的评分者间信度取决于评审者对。
BMC Med Res Methodol. 2017 Jul 11;17(1):98. doi: 10.1186/s12874-017-0380-y.
4
Examiner effect on the objective structured clinical exam - a study at five medical schools.考官对客观结构化临床考试的影响——一项在五所医学院校开展的研究
BMC Med Educ. 2017 Apr 24;17(1):71. doi: 10.1186/s12909-017-0908-1.
5
Reducing practice variation through clinical pathways-Is it enough?通过临床路径减少实践差异——这足够了吗?
Pediatr Pulmonol. 2017 May;52(5):577-579. doi: 10.1002/ppul.23653. Epub 2017 Jan 30.
6
Development and statistical assessment of a paper-based immunoassay for detection of tumor markers.开发并评估一种基于纸的免疫分析方法用于检测肿瘤标志物。
Anal Chim Acta. 2017 Jan 15;950:156-161. doi: 10.1016/j.aca.2016.11.011. Epub 2016 Nov 14.
7
The role of uncertainty regarding the results of screening immunoassays in blood establishments.
Transfus Apher Sci. 2015 Apr;52(2):252-5. doi: 10.1016/j.transci.2015.02.015. Epub 2015 Feb 14.
8
A comparison of Cohen's Kappa and Gwet's AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples.科恩氏 κ系数与格瓦特氏 AC1 系数在计算评定者间信度系数时的比较:一项对人格障碍样本进行的研究。
BMC Med Res Methodol. 2013 Apr 29;13:61. doi: 10.1186/1471-2288-13-61.
9
Reproducibility of endometrial intraepithelial neoplasia diagnosis is good, but influenced by the diagnostic style of pathologists.子宫内膜上皮内瘤变的诊断重复性较好,但受到病理医生诊断方式的影响。
Mod Pathol. 2012 Jun;25(6):877-84. doi: 10.1038/modpathol.2011.220. Epub 2012 Feb 3.
10
Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit.加权kappa系数:用于衡量名义尺度上的一致性,并考虑了尺度不一致或部分得分的情况。
Psychol Bull. 1968 Oct;70(4):213-20. doi: 10.1037/h0026256.