Wang L, Li C S
Teachers College, University of Cincinnati, OH 45221-0002, USA.
J Appl Meas. 2001;2(4):356-78.
In the past two decades of psychometric research, an array of extended item response models has been proposed to capture the complex nature of human cognition. While the literature abounds in model fit analysis, the debate on model selection in different testing conditions continues. This study examines the problems of model selection in computer adaptive testing (CAT) of cognitive errors by comparing the relative measurement efficiency of polytomous modeling over dichotomous modeling under different scoring schemes and termination criteria. Monte Carlo simulation was adopted as the inquiry paradigm to generate 1000 subjects and 100 items in the calibration sample and 200 simulees in the CAT sample. The results suggest that polytomous CAT yields marginal gains over dichotomous CAT when termination criteria are more stringent (shorter test length or smaller standard error of ability estimate). When the conventional dichotomous scoring scheme is adopted, in which all partially correct answers are scored as incorrect, polytomous CAT cannot prevent the non-uniform gain in test information as was observed in paper-and-pencil testing.
在过去二十年的心理测量学研究中,人们提出了一系列扩展的项目反应模型,以捕捉人类认知的复杂本质。虽然文献中充斥着模型拟合分析,但关于不同测试条件下模型选择的争论仍在继续。本研究通过比较在不同评分方案和终止标准下,多分类建模相对于二分类建模的相对测量效率,考察了认知错误的计算机自适应测试(CAT)中的模型选择问题。采用蒙特卡罗模拟作为探究范式,在校准样本中生成1000名受试者和100个项目,在CAT样本中生成200个模拟受试者。结果表明,当终止标准更严格(测试长度更短或能力估计的标准误差更小)时,多分类CAT比二分类CAT有微小的优势。当采用传统的二分类评分方案,即所有部分正确的答案都被计为错误时,多分类CAT无法像纸笔测试那样防止测试信息的不均匀增加。