Sari Halil Ibrahim, Huggins Anne Corinne
University of Florida, Gainesville, FL, USA.
Educ Psychol Meas. 2015 Aug;75(4):648-676. doi: 10.1177/0013164414549764. Epub 2014 Sep 12.
This study compares two methods of defining groups for the detection of differential item functioning (DIF): (a) pairwise comparisons and (b) composite group comparisons. We aim to emphasize and empirically support the notion that the choice of pairwise versus composite group definitions in DIF is a reflection of how one defines fairness in DIF studies. In this study, a simulation was conducted based on data from a 60-item ACT Mathematics test (ACT; Hanson & Béguin). The unsigned area measure method (Raju) was used as the DIF detection method. An application to operational data was also completed in the study, as well as a comparison of observed Type I error rates and false discovery rates across the two methods of defining groups. Results indicate that the amount of flagged DIF or interpretations about DIF in all conditions were not the same across the two methods, and there may be some benefits to using composite group approaches. The results are discussed in connection to differing definitions of fairness. Recommendations for practice are made.
本研究比较了两种用于检测项目功能差异(DIF)的分组定义方法:(a)两两比较和(b)综合组比较。我们旨在强调并通过实证支持这样一种观点,即在DIF研究中,两两比较与综合组定义的选择反映了人们对DIF研究中公平性的定义方式。在本研究中,基于一项有60道题目的美国大学入学考试(ACT)数学测试(ACT;汉森和贝甘)的数据进行了模拟。采用无符号面积测量法(拉朱)作为DIF检测方法。本研究还完成了对实际数据的应用,以及对两种分组定义方法的观察到的第一类错误率和错误发现率的比较。结果表明,在所有条件下,两种方法标记出的DIF数量或对DIF的解释并不相同,使用综合组方法可能有一些好处。结合公平性的不同定义对结果进行了讨论。并给出了实践建议。