比较广义部分信用模型下的计算机自适应测试停止规则。

Comparing computer adaptive testing stopping rules under the generalized partial-credit model.

机构信息

University of Texas at Austin, Austin, TX, USA.

Educational Testing Service, Princeton, NJ, USA.

出版信息

Behav Res Methods. 2019 Jun;51(3):1305-1320. doi: 10.3758/s13428-018-1068-x.

DOI:10.3758/s13428-018-1068-x

PMID:29926441

Abstract

An important consideration of any computer adaptive testing (CAT) program is the criterion used for ending item administration-the stopping rule, which ensures that all examinees are assessed to the same standard. Although various stopping rules exist, none of them have been compared under the generalized partial-credit model (Muraki in Applied Psychological Measurement, 16, 159-176, 1992). In this simulation study we compared the performance of three variable-length stopping rules-standard error (SE), minimum information (MI), and change in theta (CT)-both in isolation and in combination with requirements of minimum and maximum numbers of items, as well as a fixed-length stopping rule. Each stopping rule was examined under two termination criteria-one a more lenient requirement (SE = 0.35, MI = 0.56, CT = 0.05), and one more stringent (SE = 0.30, MI = 0.42, CT = 0.02). The simulation design also included content-balancing and exposure controls, aspects of CAT that have been excluded in previous research comparing variable-length stopping rules. The minimum-information stopping rule produced biased theta estimates and varied greatly in measurement quality across the theta distribution. The absolute-change-in-theta stopping rule had strong performance when paired with a lower criterion and a minimum test length. The standard error stopping rule consistently provided the best balance of measurement precision and operational efficiency and was based on the fewest number of administered items necessary to obtain accurate and precise theta estimates, particularly when it was paired with a maximum-number-of-items stopping rule.

摘要

任何计算机自适应测试 (CAT) 程序的一个重要考虑因素是用于结束项目管理的准则——停止规则，该规则确保所有考生都按照相同的标准进行评估。虽然存在各种停止规则，但在广义部分信用模型 (Muraki 在 Applied Psychological Measurement, 16, 159-176, 1992) 下，尚未对其进行比较。在这项模拟研究中，我们比较了三种不同长度的停止规则（标准误差 (SE)、最小信息量 (MI) 和 theta 变化 (CT)）的性能，这些规则既单独使用，也与最小和最大项目数的要求结合使用，以及固定长度的停止规则。每个停止规则都根据两个终止标准进行了检查——一个是更宽松的要求 (SE = 0.35, MI = 0.56, CT = 0.05)，另一个是更严格的要求 (SE = 0.30, MI = 0.42, CT = 0.02)。模拟设计还包括内容平衡和曝光控制，这是之前比较不同长度停止规则的 CAT 研究中排除的方面。最小信息量停止规则产生了有偏差的 theta 估计值，并且在 theta 分布的整个范围内测量质量变化很大。当与较低的标准和最小测试长度配对时，绝对 theta 变化停止规则具有很强的性能。标准误差停止规则始终提供最佳的测量精度和操作效率平衡，并且基于获得准确和精确 theta 估计所需的 administered 项目数最少，特别是当与最大项目数停止规则配对时。