Beijing Key Laboratory of Learning and Cognition, School of Psychology, Capital Normal University, No. 23 Bai Dui Zi Jia, Beijing, 100048, China.
Beijing Key Laboratory of Applied Experimental Psychology, National Demonstration Center for Experimental Psychology Education (Beijing Normal University), Faculty of Psychology, Beijing Normal University, No. 19 Xin Jie Kou Wai Street, Beijing, 100875, China.
Behav Res Methods. 2024 Dec;56(8):8695-8714. doi: 10.3758/s13428-024-02498-x. Epub 2024 Sep 13.
Computerized adaptive testing (CAT) aims to present items that statistically optimize the assessment process by considering the examinee's responses and estimated trait levels. Recent developments in reinforcement learning and deep neural networks provide CAT with the potential to select items that utilize more information across all the items on the remaining tests, rather than just focusing on the next several items to be selected. In this study, we reformulate CAT under the reinforcement learning framework and propose a new item selection strategy based on the deep Q-network (DQN) method. Through simulated and empirical studies, we demonstrate how to monitor the training process to obtain the optimal Q-networks, and we compare the accuracy of the DQN-based item selection strategy with that of five traditional strategies-maximum Fisher information, Fisher information weighted by likelihood, Kullback‒Leibler information weighted by likelihood, maximum posterior weighted information, and maximum expected information-on both simulated and real item banks and responses. We further investigate how sample size and the distribution of the trait levels of the examinees used in training affect DQN performance. The results show that DQN achieves lower RMSE and MAE values than traditional strategies under simulated and real banks and responses in most conditions. Suggestions for the use of DQN-based strategies are provided, as well as their code.
计算机化自适应测验 (CAT) 的目的是通过考虑考生的反应和估计的特质水平,呈现出在统计学上优化评估过程的项目。强化学习和深度神经网络的最新发展为 CAT 提供了选择利用剩余测试中所有项目更多信息的项目的潜力,而不仅仅是关注接下来要选择的几个项目。在这项研究中,我们在强化学习框架下重新制定了 CAT,并提出了一种新的基于深度 Q 网络 (DQN) 方法的项目选择策略。通过模拟和实证研究,我们展示了如何监控训练过程以获得最佳的 Q 网络,并比较了基于 DQN 的项目选择策略与最大信息量、似然加权信息量、Kullback-Leibler 信息量、最大后验加权信息量和最大期望信息量等五种传统策略在模拟和真实题库和反应中的准确性。我们进一步研究了训练中使用的样本量和考生特质水平分布如何影响 DQN 的性能。结果表明,在大多数情况下,DQN 在模拟和真实题库和反应中的 RMSE 和 MAE 值均低于传统策略。还提供了基于 DQN 的策略的使用建议及其代码。