Tay Louis, Huang Qiming, Vermunt Jeroen K
Purdue University, West Lafayette, IN, USA.
Tilburg University, Tilburg, Netherlands.
Educ Psychol Meas. 2016 Feb;76(1):22-42. doi: 10.1177/0013164415579488. Epub 2015 Apr 6.
In large-scale testing, the use of multigroup approaches is limited for assessing differential item functioning (DIF) across multiple variables as DIF is examined for each variable separately. In contrast, the item response theory with covariate (IRT-C) procedure can be used to examine DIF across multiple variables (covariates) simultaneously. To assess the utility of the IRT-C procedure, we conducted a simulation study. Using SAT data for realistic parameters, uniform DIF on three covariates were simulated: gender (dichotomous), race/ethnicity (categorical), and income (continuous). Simulations were conducted across several conditions: two test lengths (14 items, 21 items), four sample sizes (5,000, 10,000, 20,000, 40,000), and two DIF effect sizes (medium, large). It was found that the IRT-C procedure could accurately recover the latent means and the three-parameter logistic model parameters well with a substantial sample size of 20,000. There was good control of Type I error rates to the nominal rates across the sample sizes. Good power to detect DIF across all covariates (>.80) was observed when the sample size was 20,000 for large DIF effect size and 40,000 for medium DIF effect size. Practical implications for the use of the IRT-C procedure are discussed.
在大规模测试中,多组方法在评估多个变量的项目功能差异(DIF)方面存在局限性,因为DIF是针对每个变量分别进行检验的。相比之下,带协变量的项目反应理论(IRT-C)程序可用于同时检验多个变量(协变量)的DIF。为了评估IRT-C程序的效用,我们进行了一项模拟研究。使用SAT数据的实际参数,模拟了三个协变量上的均匀DIF:性别(二分变量)、种族/民族(分类变量)和收入(连续变量)。在几种条件下进行了模拟:两种测试长度(14个项目、21个项目)、四个样本量(5000、10000、20000、40000)和两种DIF效应大小(中等、大)。结果发现,在样本量达到20000时,IRT-C程序能够很好地准确恢复潜在均值和三参数逻辑模型参数。在所有样本量下,第一类错误率都能很好地控制在名义水平。当大DIF效应大小的样本量为20000且中等DIF效应大小的样本量为40000时,观察到在所有协变量上检测DIF的能力良好(>.80)。本文还讨论了IRT-C程序使用的实际意义。