Sun Xiaojian, Wang Shimeng, Guo Lei, Xin Tao, Song Naiqing
School of Mathematics and Statistics, Southwest University, Chongqing, China.
Southwest University Branch, Collaborative Innovation Center of Assessment for Basic Education Quality, Chongqing, China.
Appl Psychol Meas. 2023 Jun;47(4):328-346. doi: 10.1177/01466216231174559. Epub 2023 May 13.
Items with the presence of differential item functioning (DIF) will compromise the validity and fairness of a test. Studies have investigated the DIF effect in the context of cognitive diagnostic assessment (CDA), and some DIF detection methods have been proposed. Most of these methods are mainly designed to perform the presence of DIF between two groups; however, empirical situations may contain more than two groups. To date, only a handful of studies have detected the DIF effect with multiple groups in the CDA context. This study uses the generalized logistic regression (GLR) method to detect DIF items by using the estimated attribute profile as matching criteria. A simulation study is conducted to examine the performance of the two GLR methods, GLR-based Wald test (GLR-Wald) and GLR-based likelihood ratio test (GLR-LRT), in detecting the DIF items, the results based on the ordinary Wald test are also reported. Results show that (1) both GLR-Wald and GLR-LRT have more reasonable performance in controlling Type I error rates than the ordinary Wald test in most conditions; (2) the GLR method also produces higher empirical rejection rates than the ordinary Wald test in most conditions; and (3) using the estimated attribute profile as the matching criteria can produce similar Type I error rates and empirical rejection rates for GLR-Wald and GLR-LRT. A real data example is also analyzed to illustrate the application of these DIF detection methods in multiple groups.
存在差异项目功能(DIF)的项目会损害测试的有效性和公平性。已有研究在认知诊断评估(CDA)背景下探究了DIF效应,并提出了一些DIF检测方法。这些方法大多主要用于检测两组之间是否存在DIF;然而,实际情况可能涉及不止两组。迄今为止,在CDA背景下,仅有少数研究检测了多组的DIF效应。本研究使用广义逻辑回归(GLR)方法,以估计的属性概况作为匹配标准来检测DIF项目。进行了一项模拟研究,以检验两种GLR方法,即基于GLR的Wald检验(GLR-Wald)和基于GLR的似然比检验(GLR-LRT)在检测DIF项目方面的性能,还报告了基于普通Wald检验的结果。结果表明:(1)在大多数情况下,GLR-Wald和GLR-LRT在控制I型错误率方面比普通Wald检验具有更合理的性能;(2)在大多数情况下,GLR方法的实际拒绝率也高于普通Wald检验;(3)以估计的属性概况作为匹配标准,GLR-Wald和GLR-LRT的I型错误率和实际拒绝率相似。还分析了一个实际数据示例,以说明这些DIF检测方法在多组中的应用。