Zoucha James, Himelfarb Igor, Tang Nai-En
University of Northern Colorado, Greeley, CO, USA.
National Board of Chiropractic Examiners, Greeley, CO, USA.
Educ Psychol Meas. 2025 May 3:00131644251332972. doi: 10.1177/00131644251332972.
This study explored the application of deep reinforcement learning (DRL) as an innovative approach to optimize test length. The primary focus was to evaluate whether the current length of the National Board of Chiropractic Examiners Part I Exam is justified. By modeling the problem as a combinatorial optimization task within a Markov Decision Process framework, an algorithm capable of constructing test forms from a finite set of items while adhering to critical structural constraints, such as content representation and item difficulty distribution, was used. The findings reveal that although the DRL algorithm was successful in identifying shorter test forms that maintained comparable ability estimation accuracy, the existing test length of 240 items remains advisable as we found shorter test forms did not maintain structural constraints. Furthermore, the study highlighted the inherent adaptability of DRL to continuously learn about a test-taker's latent abilities and dynamically adjust to their response patterns, making it well-suited for personalized testing environments. This dynamic capability supports real-time decision-making in item selection, improving both efficiency and precision in ability estimation. Future research is encouraged to focus on expanding the item bank and leveraging advanced computational resources to enhance the algorithm's search capacity for shorter, structurally compliant test forms.
本研究探索了深度强化学习(DRL)作为一种创新方法在优化考试长度方面的应用。主要关注点是评估脊骨神经科医师资格考试第一部分当前的考试长度是否合理。通过将该问题建模为马尔可夫决策过程框架内的组合优化任务,使用了一种算法,该算法能够从有限的题目集中构建考试形式,同时遵守关键的结构约束,如内容呈现和题目难度分布。研究结果表明,尽管DRL算法成功识别出了能够保持相当能力估计准确性的较短考试形式,但由于我们发现较短的考试形式无法维持结构约束,现有的240道题目的考试长度仍然是可取的。此外,该研究强调了DRL固有的适应性,即能够持续了解考生的潜在能力并动态调整以适应他们的答题模式,使其非常适合个性化测试环境。这种动态能力支持在题目选择中进行实时决策,提高能力估计的效率和精度。鼓励未来的研究专注于扩大题目库并利用先进的计算资源,以增强算法搜索更短、结构合规的考试形式的能力。