Vanbelle Sophie, Engelhart Christina Hernandez, Blix Ellen
Methodology and Statistics, CAPHRI, Maastricht University, P. Debyeplein, 1, Maastricht, 6229 HA, The Netherlands.
Norwegian Research Center for Women's Health, Oslo University Hospital, P.O box 4950 Nydalen, Oslo, N-0424, Norway.
BMC Med Res Methodol. 2024 Dec 20;24(1):310. doi: 10.1186/s12874-024-02431-y.
A recent systematic review revealed issues in regard to performing and reporting agreement and reliability studies for ordinal scales, especially in the presence of more than two observers. This paper therefore aims to provide all necessary information in regard to the choice among the most meaningful and most used measures and the planning of agreement and reliability studies for ordinal outcomes.
This paper considers the generalisation of the proportion of (dis)agreement, the mean absolute deviation, the mean squared deviation and weighted kappa coefficients to more than two observers in the presence of an ordinal outcome.
After highlighting the difference between the concepts of agreement and reliability, a clear and simple interpretation of the agreement and reliability coefficients is provided. The large sample variance of the various coefficients with the delta method is presented or derived if not available in the literature to construct Wald confidence intervals. Finally, a procedure to determine the minimum number of raters and patients needed to limit the uncertainty associated with the sampling process is provided. All the methods are available in an R package and a Shiny application to circumvent the limitations of current software.
The present paper completes existing guidelines, such as the Guidelines for Reporting Reliability and Agreement Studies (GRRAS), to improve the quality of reliability and agreement studies of clinical tests. Furthermore, we provide open source software to researchers with minimum programming skills.
最近的一项系统评价揭示了在进行和报告有序量表的一致性和可靠性研究方面存在的问题,尤其是在有两名以上观察者的情况下。因此,本文旨在提供有关在最有意义和最常用的测量方法中进行选择以及针对有序结果进行一致性和可靠性研究规划的所有必要信息。
本文考虑了在有序结果存在的情况下,将(不)一致比例、平均绝对偏差、均方偏差和加权kappa系数推广到两名以上观察者的情况。
在强调了一致性和可靠性概念之间的差异之后,对一致性和可靠性系数进行了清晰简单的解释。如果文献中没有提供用德尔塔法计算各种系数的大样本方差,本文将给出或推导该方差,以构建Wald置信区间。最后,提供了一种确定评分者和患者的最小数量的程序,以限制与抽样过程相关的不确定性。所有方法都可以在一个R包和一个Shiny应用程序中使用,以规避当前软件的局限性。
本文完善了现有指南,如《报告可靠性和一致性研究指南》(GRRAS),以提高临床试验可靠性和一致性研究的质量。此外,我们为编程技能最低的研究人员提供了开源软件。