Lai Mark H C
Department of Psychology.
Psychol Methods. 2021 Feb;26(1):90-102. doi: 10.1037/met0000287. Epub 2020 Jul 16.
This article shows how the concept of reliability of composite scores, as defined in classical test theory, can be extended to the context of multilevel modeling. In particular, it discusses the contributions and limitations of the various level-specific reliability indices proposed by Geldhof, Preacher, and Zyphur (2014), denoted as ω̃ and ω̃ (and also α̃ and α̃). One major limitation of those indices is that they are quantities for latent, unobserved level-specific composite scores, and are not suitable for observed composites at different levels. As illustrated using simulated data in this article, ω̃ can drastically overestimate the true reliability of between-level composite scores (i.e., observed cluster means). Another limitation is that the development of those indices did not consider the recent conceptual development on construct meanings in multilevel modeling (Stapleton & Johnson, 2019; Stapleton, Yang, & Hancock, 2016). To address the second limitation, this article defines reliability indices (ω, ω, ω, α, α, α) for three types of multilevel observed composite scores measuring various multilevel constructs: individual, configural, shared, and within-cluster. The article also shows how researchers can obtain sample point and interval estimates using the derived formulas and the provided R and Mplus code. In addition, a large-scale national data set was used to illustrate the proposed methods for estimating reliability for the three types of multilevel composite scores, and practical recommendations on when different indices should be reported are provided. (PsycInfo Database Record (c) 2021 APA, all rights reserved).
本文展示了经典测试理论中定义的综合分数可靠性概念如何扩展到多层次建模的背景中。具体而言,它讨论了Geldhof、Preacher和Zyphur(2014年)提出的各种特定层次可靠性指标(表示为ω̃和ω̃,以及α̃和α̃)的贡献和局限性。这些指标的一个主要局限性在于,它们是针对潜在的、未观察到的特定层次综合分数的量,不适用于不同层次的观察到的综合分数。如本文使用模拟数据所示,ω̃可能会大幅高估层次间综合分数(即观察到的聚类均值)的真实可靠性。另一个局限性是,这些指标的开发没有考虑多层次建模中关于构念意义的最新概念发展(Stapleton & Johnson,2019年;Stapleton、Yang和Hancock,2016年)。为了解决第二个局限性,本文为测量各种多层次构念的三种多层次观察到的综合分数定义了可靠性指标(ω、ω、ω、α、α、α):个体、构型、共享和聚类内。本文还展示了研究人员如何使用推导公式以及提供的R和Mplus代码获得样本点估计和区间估计。此外,使用了一个大规模的国家数据集来说明为三种多层次综合分数估计可靠性的建议方法,并提供了关于何时应报告不同指标的实用建议。(PsycInfo数据库记录(c)2021年美国心理学会,保留所有权利)