Brandt Holger, Chen Siyuan Marco, Bauer Daniel J
Methods Center, University of Tubingen.
Department of Psychology, University of North Carolina at Chapel Hill.
Psychol Methods. 2025 Jun;30(3):482-512. doi: 10.1037/met0000552. Epub 2023 Jun 8.
Measurement invariance (MI) is one of the main psychometric requirements for analyses that focus on potentially heterogeneous populations. MI allows researchers to compare latent factor scores across persons from different subgroups, whereas if a measure is not invariant across all items and persons then such comparisons may be misleading. If full MI does not hold further testing may identify problematic items showing differential item functioning (DIF). Most methods developed to test DIF focused on simple scenarios often with comparisons across two groups. In practical applications, this is an oversimplification if many grouping variables (e.g., gender, race) or continuous covariates (e.g., age) exist that might influence the measurement properties of items; these variables are often correlated, making traditional tests that consider each variable separately less useful. Here, we propose the application of Bayesian Moderated Nonlinear Factor Analysis to overcome limitations of traditional approaches to detect DIF. We investigate how modern Bayesian shrinkage priors can be used to identify DIF items in situations with many groups and continuous covariates. We compare the performance of lasso-type, spike-and-slab, and global-local shrinkage priors (e.g., horseshoe) to standard normal and small variance priors. Results indicate that spike-and-slab and lasso priors outperform the other priors. Horseshoe priors provide slightly lower power compared to lasso and spike-and-slab priors. Small variance priors result in very low power to detect DIF with sample sizes below 800, and normal priors may produce severely inflated type I error rates. We illustrate the approach with data from the PISA 2018 study. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
测量不变性(MI)是针对潜在异质人群的分析的主要心理测量学要求之一。MI使研究人员能够比较来自不同亚组的个体的潜在因子得分,而如果一个测量在所有项目和个体上不是不变的,那么这种比较可能会产生误导。如果完全的MI不成立,进一步的测试可能会识别出显示项目功能差异(DIF)的有问题的项目。大多数用于测试DIF的方法都集中在简单的场景中,通常是两组之间的比较。在实际应用中,如果存在许多可能影响项目测量属性的分组变量(如性别、种族)或连续协变量(如年龄),这就是一种过度简化;这些变量通常是相关的,使得分别考虑每个变量的传统测试不太有用。在这里,我们提出应用贝叶斯调节非线性因子分析来克服传统方法检测DIF的局限性。我们研究如何使用现代贝叶斯收缩先验来识别具有多个组和连续协变量情况下的DIF项目。我们将套索型、尖峰和平板以及全局-局部收缩先验(如马蹄形)与标准正态和小方差先验的性能进行比较。结果表明,尖峰和平板先验以及套索先验的性能优于其他先验。与套索和尖峰和平板先验相比,马蹄形先验的功效略低。对于样本量低于800的情况,小方差先验检测DIF的功效非常低,而正态先验可能会产生严重膨胀的I型错误率。我们用2018年国际学生评估项目(PISA)研究的数据说明了该方法。(PsycInfo数据库记录(c)2025美国心理学会,保留所有权利)