An Giang University, VNU-HCM, Long Xuyen City, An Giang Province, 076, Vietnam.
Mathematical Sciences, School of Science, RMIT University, Melbourne, Victoria, 3000, Australia.
BMC Med Res Methodol. 2021 Apr 14;21(1):70. doi: 10.1186/s12874-021-01248-3.
In an inter-rater agreement study, if two raters tend to rate considering different aspects of the subject of interest or have different experience levels, a grey zone occurs among the levels of a square contingency table showing the inter-rater agreement. These grey zones distort the degree of agreement between raters and negatively impact the decisions based on the inter-rater agreement tables. In this sense, it is important to know how the existence of a grey zone impacts the inter-rater agreement coefficients to choose the most reliable agreement coefficient against the grey zones to reach out with more reliable decisions.
In this article, we propose two approaches to create grey zones in simulations setting and conduct an extensive Monte Carlo simulation study to figure out the impact of having grey zones on the weighted inter-rater agreement measures for ordinal tables over a comprehensive simulation space.
The weighted inter-rater agreement coefficients are not reliable against the existence of grey zones. Increasing sample size and the number of categories in the agreement table decreases the accuracy of weighted inter-rater agreement measures when there is a grey zone. When the degree of agreement between the raters is high, the agreement measures are not significantly impacted by the existence of grey zones. However, if there is a medium to low degree of inter-rater agreement, all the weighted coefficients are more or less impacted.
It is observed in this study that the existence of grey zones has a significant negative impact on the accuracy of agreement measures especially for a low degree of true agreement and high sample and tables sizes. In general, Gwet's AC2 and Brennan-Prediger's κ with quadratic or ordinal weights are reliable against the grey zones.
在评分者间一致性研究中,如果两位评分者倾向于考虑感兴趣的主题的不同方面,或者具有不同的经验水平,那么在显示评分者间一致性的方形列联表的水平中会出现灰色区域。这些灰色区域会扭曲评分者之间的一致性程度,并对基于评分者间一致性表的决策产生负面影响。从这个意义上说,了解灰色区域的存在如何影响评分者间一致性系数对于选择最可靠的一致性系数以应对灰色区域,从而做出更可靠的决策是很重要的。
在本文中,我们提出了两种在模拟环境中创建灰色区域的方法,并进行了广泛的蒙特卡罗模拟研究,以确定灰色区域对有序表的加权评分者间一致性度量在全面的模拟空间中的影响。
加权评分者间一致性系数在存在灰色区域的情况下不可靠。当存在灰色区域时,增加样本量和一致性表中的类别数量会降低加权评分者间一致性度量的准确性。当评分者之间的一致性程度较高时,一致性度量不会受到灰色区域存在的显著影响。然而,如果评分者之间的一致性程度中等偏低,则所有加权系数或多或少都会受到影响。
本研究观察到,灰色区域的存在对一致性度量的准确性有显著的负面影响,尤其是在真实一致性程度较低、样本量和表量大的情况下。一般来说,Gwet 的 AC2 和 Brennan-Prediger 的 κ 与二次或有序权重在应对灰色区域时是可靠的。