Wang Wen-Chung
Hong Kong Institute of Education, Department of Educational Psychology, Counseling and Learning Needs, 10 Lo Ping Road, Tai Po, New Territories, Hong Kong.
J Appl Meas. 2008;9(4):387-408.
This study addresses several important issues in assessment of differential item functioning (DIF). It starts with the definition of DIF, effectiveness of using item fit statistics to detect DIF, and linear modeling of DIF in dichotomous items, polytomous items, facets, and testlet-based items. Because a common metric over groups of test-takers is a prerequisite in DIF assessment, this study reviews three such methods of establishing a common metric: the equal-mean-difficulty method, the all-other-item method, and the constant-item (CI) method. A small simulation demonstrates the superiority of the CI method over the others. As the CI method relies on a correct specification of DIF-free items to serve as anchors, a method of identifying such items is recommended and its effectiveness is illustrated through a simulation. Finally, this study discusses how to assess practical significance of DIF at both item and test levels.
本研究探讨了差异项目功能(DIF)评估中的几个重要问题。它从DIF的定义、使用项目拟合统计量检测DIF的有效性,以及二分项目、多分项目、侧面和基于题组的项目中DIF的线性建模开始。由于在DIF评估中,考生群体之间的共同度量是一个先决条件,因此本研究回顾了三种建立共同度量的方法:等平均难度法、所有其他项目法和常数项目(CI)法。一个小型模拟展示了CI法相对于其他方法的优越性。由于CI法依赖于对无DIF项目的正确设定作为锚点,因此推荐了一种识别此类项目的方法,并通过模拟说明了其有效性。最后,本研究讨论了如何在项目和测试层面评估DIF的实际意义。