Department of Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA.
The Dartmouth Institute, Geisel School of Medicine, Dartmouth College, 1 Rope Ferry Road, Hanover, 03755, NH, USA.
BMC Med Res Methodol. 2018 Sep 12;18(1):93. doi: 10.1186/s12874-018-0550-6.
Intraclass correlation coefficients (ICC) are recommended for the assessment of the reliability of measurement scales. However, the ICC is subject to a variety of statistical assumptions such as normality and stable variance, which are rarely considered in health applications.
A Bayesian approach using hierarchical regression and variance-function modeling is proposed to estimate the ICC with emphasis on accounting for heterogeneous variances across a measurement scale. As an application, we review the implementation of using an ICC to evaluate the reliability of Observer OPTION, an instrument which used trained raters to evaluate the level of Shared Decision Making between clinicians and patients. The study used two raters to evaluate recordings of 311 clinical encounters across three studies to evaluate the impact of using a Personal Decision Aid over usual care. We particularly focus on deriving an estimate for the ICC when multiple studies are being considered as part of the data.
The results demonstrate that ICC varies substantially across studies and patient-physician encounters within studies. Using the new framework we developed, the study-specific ICCs were estimated to be 0.821, 0.295, and 0.644. If the within- and between-encounter variances were assumed to be the same across studies, the estimated within-study ICC was 0.609. If heteroscedasticity is not properly adjusted for, the within-study ICC estimate was inflated to be as high as 0.640. Finally, if the data were pooled across studies without accounting for the variability between studies then ICC estimates were further inflated by approximately 0.02 while formerly allowing for between study variation in the ICC inflated its estimated value by approximately 0.066 to 0.072 depending on the model.
We demonstrated that misuse of the ICC statistics under common assumption violations leads to misleading and likely inflated estimates of interrater reliability. A statistical analysis that overcomes these violations by expanding the standard statistical model to account for them leads to estimates that are a better reflection of a measurement scale's reliability while maintaining ease of interpretation. Bayesian methods are particularly well suited to estimating the expanded statistical model.
组内相关系数(ICC)常用于评估测量量表的可靠性。然而,ICC 受到多种统计假设的限制,例如正态性和稳定方差,这些假设在健康应用中很少被考虑。
提出了一种使用层次回归和方差函数建模的贝叶斯方法来估计 ICC,重点是考虑测量量表中异质方差。作为应用,我们回顾了使用 ICC 评估 Observer OPTION 可靠性的实施情况,Observer OPTION 是一种使用经过培训的评估者来评估临床医生和患者之间共享决策水平的工具。该研究使用两名评估者评估了三个研究中的 311 次临床就诊记录,以评估在常规护理基础上使用个人决策辅助工具的影响。我们特别关注在考虑作为数据一部分的多个研究时如何得出 ICC 的估计值。
结果表明,ICC 在研究之间以及研究内的患者-医生就诊中差异很大。使用我们开发的新框架,估计研究特异性 ICC 分别为 0.821、0.295 和 0.644。如果假设研究之间的个体内和个体间方差相同,则估计的研究内 ICC 为 0.609。如果未正确调整异方差,则个体内 ICC 估计值会膨胀到高达 0.640。最后,如果在不考虑研究之间变异性的情况下对数据进行汇总,则 ICC 估计值会进一步膨胀约 0.02,而以前允许 ICC 中存在研究间变异会使其估计值膨胀约 0.066 至 0.072,具体取决于模型。
我们表明,在常见假设违反的情况下误用 ICC 统计数据会导致对评分者间可靠性的误导和可能过高的估计。通过扩展标准统计模型来考虑这些违反来进行统计分析,可以更好地反映测量量表的可靠性,同时保持易于解释。贝叶斯方法特别适合估计扩展的统计模型。