The University of Iowa, 8 North Shore Drive, Edwardsville, IL, 62025, USA.
Amazon Web Services, Monroe St #1400, Chicago, IL, 60606, USA.
Psychometrika. 2024 Mar;89(1):4-41. doi: 10.1007/s11336-024-09965-6. Epub 2024 Apr 5.
Differential item functioning (DIF) is a standard analysis for every testing company. Research has demonstrated that DIF can result when test items measure different ability composites, and the groups being examined for DIF exhibit distinct underlying ability distributions on those composite abilities. In this article, we examine DIF from a two-dimensional multidimensional item response theory (MIRT) perspective. We begin by delving into the compensatory MIRT model, illustrating and how items and the composites they measure can be graphically represented. Additionally, we discuss how estimated item parameters can vary based on the underlying latent ability distributions of the examinees. Analytical research highlighting the consequences of ignoring dimensionally and applying unidimensional IRT models, where the two-dimensional latent space is mapped onto a unidimensional, is reviewed. Next, we investigate three different approaches to understanding DIF from a MIRT standpoint: 1. Analytically Uniform and Nonuniform DIF: When two groups of interest have different two-dimensional ability distributions, a unidimensional model is estimated. 2. Accounting for complete latent ability space: We emphasize the importance of considering the entire latent ability space when using DIF conditional approaches, which leads to the mitigation of DIF effects. 3. Scenario-Based DIF: Even when underlying two-dimensional distributions are identical for two groups, differing problem-solving approaches can still lead to DIF. Modern software programs facilitate routine DIF procedures for comparing response data from two identified groups of interest. The real challenge is to identify why DIF could occur with flagged items. Thus, as a closing challenge, we present four items (Appendix A) from a standardized test and invite readers to identify which group was favored by a DIF analysis.
差异项目功能(DIF)是每个测试公司的标准分析。研究表明,当测试项目测量不同的能力综合时,可能会出现 DIF,并且正在检查 DIF 的组在这些综合能力上表现出明显不同的潜在能力分布。在本文中,我们从二维多维项目反应理论(MIRT)的角度研究 DIF。我们首先深入研究补偿 MIRT 模型,说明和如何图形表示项目和他们测量的组合。此外,我们还讨论了基于考生潜在能力分布,估计项目参数如何变化。我们回顾了强调忽略维度并应用单维 IRT 模型的分析研究,其中二维潜在空间映射到单维空间。接下来,我们从 MIRT 角度研究了三种理解 DIF 的不同方法:1. 分析均匀和非均匀 DIF:当两个感兴趣的群体具有不同的二维能力分布时,估计单维模型。2. 考虑完整的潜在能力空间:我们强调在使用 DIF 条件方法时考虑整个潜在能力空间的重要性,这导致减轻 DIF 效应。3. 基于情景的 DIF:即使两个群体的潜在二维分布相同,不同的解决问题的方法仍然会导致 DIF。现代软件程序促进了比较两个确定的感兴趣群体的响应数据的常规 DIF 程序。真正的挑战是确定为什么 DIF 可能会在标记的项目中发生。因此,作为一个闭幕挑战,我们提出了来自标准化测试的四个项目(附录 A),并邀请读者确定 DIF 分析中哪个群体受到青睐。