Smith Richard M, Suh Kyunghee K
P. O. Box 1283, Maple Grove, MN 55311, usa.
J Appl Meas. 2003;4(2):153-63.
The invariance of the estimated parameters across variation in the incidental parameters of a sample is one of the most important properties of Rasch measurement models. This is the property that allows the equating of test forms and the use of computer adaptive testing. It necessarily follows that in Rasch models if the data fit the model, than the estimation of the parameter of interest must be invariant across sub-samples of the items or persons. This study investigates the degree to which the INFIT and OUTFIT item fit statistics in WINSTEPS detect violations of the invariance property of Rasch measurement models. The test in this study is a 80 item multiple-choice test used to assess mathematics competency. The WINSTEPS analysis of the dichotomous results, based on a sample of 2000 from a very large number of students who took the exam, indicated that only 7 of the 80 items misfit using the 1.3 mean square criteria advocated by Linacre and Wright. Subsequent calibration of separate samples of 1,000 students from the upper and lower third of the person raw score distribution, followed by a t-test comparison of the item calibrations, indicated that the item difficulties for 60 of the 80 items were more than 2 standard errors apart. The separate calibration t-values ranged from +21.00 to -7.00 with the t-test value of 41 of the 80 comparisons either larger than +5 or smaller than -5. Clearly these data do not exhibit the invariance of the item parameters expected if the data fit the model. Yet the INFIT and OUTFIT mean squares are completely insensitive to the lack of invariance in the item parameters. If the OUTFIT ZSTD from WINSTEPS was used with a critical value of | t | > 2.0, then 56 of the 60 items identified by the separate calibration t-test would be identified as misfitting. A fourth measure of misfit, the between ability-group item fit statistic identified 69 items as misfitting when a critical value of t > 2.0 was used. Clearly relying solely on the INFIT and OUTFIT mean squares in WINSETPS to assess the fit of the data to the model would cause one to miss one of the most important threats to the usefulness of the measurement model.
样本附带参数变化时估计参数的不变性是拉施测量模型最重要的属性之一。正是这一属性使得能够进行试卷等值以及使用计算机自适应测试。必然可以得出,在拉施模型中,如果数据符合该模型,那么感兴趣参数的估计在项目或人员的子样本中必须是不变的。本研究调查了WINSTEPS中的INFIT和OUTFIT项目拟合统计量检测拉施测量模型不变性属性违反情况的程度。本研究中的测试是一个用于评估数学能力的80道多项选择题测试。基于参加考试的大量学生中的2000个样本对二分结果进行的WINSTEPS分析表明,按照林纳克和赖特倡导的1.3均方标准,80个项目中只有7个项目拟合不佳。随后对人员原始分数分布上下三分之一的1000名学生的单独样本进行校准,然后对项目校准进行t检验比较,结果表明80个项目中有60个项目的项目难度相差超过2个标准差。单独校准的t值范围从+21.00到 -7.00,80次比较中有41次的t检验值要么大于+5,要么小于 -5。显然,如果数据符合模型,这些数据并未表现出预期的项目参数不变性。然而,INFIT和OUTFIT均方对项目参数缺乏不变性完全不敏感。如果将WINSTEPS中的OUTFIT ZSTD与|t|>2.0的临界值一起使用,那么单独校准t检验确定的60个项目中有56个将被确定为拟合不佳。失拟的第四个度量,即能力组间项目拟合统计量,在使用t>2.0的临界值时确定有69个项目拟合不佳。显然,仅仅依靠WINSTEPS中的INFIT和OUTFIT均方来评估数据与模型的拟合度会导致人们忽略测量模型有效性的最重要威胁之一。