Sari Halil Ibrahim, Raborn Anthony
Kilis 7 Aralik University, Turkey.
University of Florida, Gainesville, USA.
Appl Psychol Meas. 2018 Sep;42(6):499-515. doi: 10.1177/0146621617752990. Epub 2018 Feb 4.
There are many item selection methods proposed for computerized adaptive testing (CAT) applications. However, not all of them have been used in computerized multistage testing (ca-MST). This study uses some item selection methods as a routing method in ca-MST framework. These are maximum Fisher information (MFI), maximum likelihood weighted information (MLWI), maximum posterior weighted information (MPWI), Kullback-Leibler (KL), and posterior Kullback-Leibler (KLP). The main purpose of this study is to examine the performance of these methods when they are used as a routing method in ca-MST applications. These five information methods under four ca-MST panel designs and two test lengths (30 items and 60 items) were tested using the parameters of a real item bank. Results were evaluated with overall findings (mean bias, root mean square error, correlation between true and estimated thetas, and module exposure rates) and conditional findings (conditional absolute bias, standard error of measurement, and root mean square error). It was found that test length affected the outcomes much more than other study conditions. Under 30-item conditions, 1-3 designs outperformed other panel designs. Under 60-item conditions, 1-3-3 designs were better than other panel designs. Each routing method performed well under particular conditions; there was no clear best method in the studied conditions. The recommendations for routing methods in any particular condition were provided for researchers and practitioners as well as the limitations of these results.
针对计算机自适应测试(CAT)应用,人们提出了许多项目选择方法。然而,并非所有这些方法都已用于计算机化多阶段测试(ca-MST)。本研究将一些项目选择方法用作ca-MST框架中的一种路由方法。这些方法包括最大Fisher信息(MFI)、最大似然加权信息(MLWI)、最大后验加权信息(MPWI)、Kullback-Leibler(KL)和后验Kullback-Leibler(KLP)。本研究的主要目的是检验这些方法在用作ca-MST应用中的路由方法时的性能。使用一个真实题库的参数,对四种ca-MST面板设计和两种测试长度(30题和60题)下的这五种信息方法进行了测试。结果通过总体结果(平均偏差、均方根误差、真实和估计θ之间的相关性以及模块曝光率)和条件结果(条件绝对偏差、测量标准误差和均方根误差)进行评估。研究发现,测试长度对结果的影响远大于其他研究条件。在30题的条件下,1-3设计优于其他面板设计。在60题的条件下,1-3-3设计比其他面板设计更好。每种路由方法在特定条件下表现良好;在所研究的条件下没有明显的最佳方法。为研究人员和从业者提供了在任何特定条件下路由方法的建议以及这些结果的局限性。