DeMars Christine E, Jurich Daniel P
James Madison University, Harrisonburg, VA, USA.
National Board of Medical Examiners, Philadelphia, PA, USA.
Educ Psychol Meas. 2015 Aug;75(4):610-633. doi: 10.1177/0013164414554082. Epub 2014 Oct 20.
In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data follow a three-parameter logistic model. With large group ability differences, difficult non-DIF items appeared to favor the focal group and easy non-DIF items appeared to favor the reference group. Correspondingly, the effect sizes for DIF items were biased. These effects were mitigated when data were coded as missing for item-examinee encounters in which the person measure was considerably lower than the item location. Explanation of these results is provided by illustrating how the item response function becomes differentially distorted by guessing depending on the groups' ability distributions. In terms of practical implications, results suggest that measurement practitioners should not trust the DIF estimates from the Rasch model when there is a large difference in ability and examinees are potentially able to answer items correctly by guessing, unless data from examinees poorly matched to the item difficulty are coded as missing.
在教育测试中,必须准确估计项目功能差异(DIF)统计量,以确保标记出合适的项目进行检查或剔除。本研究表明,当能力(影响)存在较大组间差异且数据遵循三参数逻辑模型时,使用拉施模型估计DIF可能会在结果中引入相当大的偏差。在能力存在较大组间差异的情况下,难度较大的非DIF项目似乎有利于目标组,而容易的非DIF项目似乎有利于参照组。相应地,DIF项目的效应大小存在偏差。当将考生能力远低于项目难度水平的项目-考生作答情况编码为缺失数据时,这些影响会得到缓解。通过说明项目反应函数如何根据组间能力分布因猜测而产生不同程度的扭曲,对这些结果进行了解释。就实际意义而言,研究结果表明,当能力存在较大差异且考生有可能通过猜测正确回答项目时,测量从业者不应相信拉施模型得出的DIF估计值,除非将与项目难度匹配度较差的考生数据编码为缺失数据。