Research Institute of Child Development and Education, University of Amsterdam.
Psychol Methods. 2024 Oct;29(5):967-979. doi: 10.1037/met0000516. Epub 2022 Sep 1.
Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. (PsycInfo Database Record (c) 2024 APA, all rights reserved).
有几个组内相关系数 (ICC) 可用于评估观察测量的评分者间可靠性 (IRR)。选择 ICC 比较复杂,现有的指南有三个主要局限性。首先,它们没有讨论不完全设计,即评分者在部分受试间变化。其次,它们没有对 ICC 中的误差方差提供一致的观点,使得在可用系数之间的选择变得模糊。第三,固定或随机评分者之间的区别常常被误解。基于可概括性理论 (GT),我们为 IRR 的 ICC 选择提供了更新的指南,这些指南适用于完整和不完全的观察设计。我们通过声称评分者很少(如果有的话)应被视为固定来挑战关于 IRR 的 ICC 的传统观念。此外,我们澄清了在不平衡和不完全设计的情况下如何解释 ICC。我们解释了研究人员在为 IRR 选择 ICC 时需要做出的四个选择,并通过流程图引导研究人员做出这些选择,我们将该流程图应用于来自临床和发展领域的三个实证示例。在讨论部分,我们提供了关于报告、解释和估计 ICC 的指导,并为研究 IRR 的 ICC 提出了未来的方向。(PsycInfo 数据库记录(c)2024 APA,保留所有权利)。