van Rijn Peter W, Ali Usama S
ETS Global, Amsterdam, The Netherlands.
Educational Testing Service, Princeton, New Jersey, USA.
Br J Math Stat Psychol. 2017 May;70(2):317-345. doi: 10.1111/bmsp.12101.
We compare three modelling frameworks for accuracy and speed of item responses in the context of adaptive testing. The first framework is based on modelling scores that result from a scoring rule that incorporates both accuracy and speed. The second framework is the hierarchical modelling approach developed by van der Linden (2007, Psychometrika, 72, 287) in which a regular item response model is specified for accuracy and a log-normal model for speed. The third framework is the diffusion framework in which the response is assumed to be the result of a Wiener process. Although the three frameworks differ in the relation between accuracy and speed, one commonality is that the marginal model for accuracy can be simplified to the two-parameter logistic model. We discuss both conditional and marginal estimation of model parameters. Models from all three frameworks were fitted to data from a mathematics and spelling test. Furthermore, we applied a linear and adaptive testing mode to the data off-line in order to determine differences between modelling frameworks. It was found that a model from the scoring rule framework outperformed a hierarchical model in terms of model-based reliability, but the results were mixed with respect to correlations with external measures.
我们比较了三种建模框架在自适应测试背景下项目反应的准确性和速度。第一个框架基于对由同时纳入准确性和速度的评分规则得出的分数进行建模。第二个框架是范德林登(2007年,《心理测量学》,72卷,287页)开发的分层建模方法,其中为准确性指定了一个常规项目反应模型,为速度指定了一个对数正态模型。第三个框架是扩散框架,其中假设反应是维纳过程的结果。尽管这三个框架在准确性和速度之间的关系上有所不同,但一个共同点是准确性的边际模型可以简化为双参数逻辑模型。我们讨论了模型参数的条件估计和边际估计。来自所有三个框架的模型都拟合了数学和拼写测试的数据。此外,我们对数据进行离线线性和自适应测试模式,以确定建模框架之间的差异。结果发现,评分规则框架的一个模型在基于模型的可靠性方面优于分层模型,但在与外部测量的相关性方面结果不一。