Department of Methodology and Statistics, School of Social and Behavioral Sciences, Tilburg University, PO Box 90153, 5000 LE, Tilburg, The Netherlands.
Behav Res Methods. 2022 Oct;54(5):2114-2145. doi: 10.3758/s13428-021-01690-7. Epub 2021 Dec 15.
In social sciences, the study of group differences concerning latent constructs is ubiquitous. These constructs are generally measured by means of scales composed of ordinal items. In order to compare these constructs across groups, one crucial requirement is that they are measured equivalently or, in technical jargon, that measurement invariance (MI) holds across the groups. This study compared the performance of scale- and item-level approaches based on multiple group categorical confirmatory factor analysis (MG-CCFA) and multiple group item response theory (MG-IRT) in testing MI with ordinal data. In general, the results of the simulation studies showed that MG-CCFA-based approaches outperformed MG-IRT-based approaches when testing MI at the scale level, whereas, at the item level, the best performing approach depends on the tested parameter (i.e., loadings or thresholds). That is, when testing loadings equivalence, the likelihood ratio test provided the best trade-off between true-positive rate and false-positive rate, whereas, when testing thresholds equivalence, the χ test outperformed the other testing strategies. In addition, the performance of MG-CCFA's fit measures, such as RMSEA and CFI, seemed to depend largely on the length of the scale, especially when MI was tested at the item level. General caution is recommended when using these measures, especially when MI is tested for each item individually.
在社会科学中,研究潜在结构的群体差异是普遍存在的。这些结构通常通过由序数项目组成的量表来测量。为了在群体之间比较这些结构,一个关键要求是它们在群体之间具有等效的测量,或者用技术术语来说,就是测量不变性(MI)成立。本研究比较了基于多群组分类验证性因子分析(MG-CCFA)和多群组项目反应理论(MG-IRT)的量表和项目级方法在测试序数量表数据的 MI 方面的性能。一般来说,模拟研究的结果表明,在测试量表水平的 MI 时,基于 MG-CCFA 的方法优于基于 MG-IRT 的方法,而在测试项目水平的 MI 时,表现最好的方法取决于测试的参数(即,负载或阈值)。也就是说,当测试负载等效性时,似然比检验在真阳性率和假阳性率之间提供了最佳的权衡,而当测试阈值等效性时,χ 检验优于其他测试策略。此外,MG-CCFA 的拟合度指标,如 RMSEA 和 CFI 的性能似乎在很大程度上取决于量表的长度,尤其是在测试项目水平的 MI 时。当逐个测试每个项目的 MI 时,建议谨慎使用这些指标。