Dimitrov Dimiter M
George Mason University, Fairfax, VA, USA.
National Center for Assessment, Riyadh, Saudi Arabia.
Educ Psychol Meas. 2016 Dec;76(6):954-975. doi: 10.1177/0013164416631100. Epub 2016 Feb 16.
This article describes an approach to test scoring, referred to as (-scoring), for tests with dichotomously scored items. The -scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The -score is computed from the examinee's response vector, which is weighted by the expected difficulties (not "easiness") of the test items. The expected difficulty of each item is obtained as an analytic function of its IRT parameters. The -scores are independent of the sample of test-takers as they are based on expected item difficulties. It is shown that the -scale performs a good bit better than the IRT logit scale by criteria of scale intervalness. To equate -scales, it is sufficient to rescale the item parameters, thus avoiding tedious and error-prone procedures of mapping test characteristic curves under the method of IRT true score equating, which is often used in the practice of large-scale testing. The proposed -scaling proved promising under its current piloting with large-scale assessments and the hope is that it can efficiently complement IRT procedures in the practice of large-scale testing in the field of education and psychology.
本文描述了一种用于二分计分项目测试的计分方法,称为(-计分)。-计分利用项目反应理论(IRT)校准的信息,以便在大规模评估的背景下进行计算和解释。-分数是根据考生的反应向量计算得出的,该向量由测试项目的预期难度(而非“容易程度”)加权。每个项目的预期难度是作为其IRT参数的解析函数获得的。-分数与考生样本无关,因为它们基于预期的项目难度。结果表明,按照量表区间性标准,-量表的表现比IRT对数量表好得多。为了使-量表等值,只需重新调整项目参数,从而避免了在大规模测试实践中经常使用的IRT真分数等值法下绘制测试特征曲线的繁琐且容易出错的程序。在目前与大规模评估的试点中,所提出的-量表显示出前景,希望它能在教育和心理学领域的大规模测试实践中有效地补充IRT程序。