Liu Xiaowen, Loken Eric
Key Research Base of Humanities and Social Sciences of the Ministry of Education, Academy of Psychology and Behavior, Tianjin Normal University, Tianjin, China.
Faculty of Psychology, Tianjin Normal University, China.
Educ Psychol Meas. 2025 Jan 7:00131644241306990. doi: 10.1177/00131644241306990.
In computerized adaptive testing (CAT), examinees see items targeted to their ability level. Postoperational data have a high degree of missing information relative to designs where everyone answers all questions. Item responses are observed over a restricted range of abilities, reducing item-total score correlations. However, if the adaptive item selection depends only on observed responses, the data are missing at random (MAR). We simulated data from three different testing designs (common items, randomly selected items, and CAT) and found that it was possible to re-estimate both person and item parameters from postoperational CAT data. In a multidimensional CAT, we show that it is necessary to include all responses from the testing phase to avoid violating missing data assumptions. We also observed that some CAT designs produced "reversals" where item discriminations became negative causing dramatic under and over-estimation of abilities. Our results apply to situations where researchers work with data drawn from adaptive testing or from instructional tools with adaptive delivery. To avoid bias, researchers must make sure they use all the data necessary to meet the MAR assumptions.
在计算机自适应测试(CAT)中,考生会看到针对其能力水平的题目。与所有人都回答所有问题的设计相比,测试后的数据存在高度的信息缺失。在能力的有限范围内观察到题目回答情况,这降低了题目总分相关性。然而,如果自适应题目选择仅取决于观察到的回答,那么数据就是随机缺失(MAR)。我们模拟了来自三种不同测试设计(共同题目、随机选择题目和CAT)的数据,发现从测试后CAT数据中重新估计人和题目的参数是可行的。在多维CAT中,我们表明有必要纳入测试阶段的所有回答,以避免违反缺失数据假设。我们还观察到,一些CAT设计会产生“反转”,即题目区分度变为负数,导致能力的严重低估和高估。我们的结果适用于研究人员处理来自自适应测试或具有自适应交付功能的教学工具的数据的情况。为避免偏差,研究人员必须确保他们使用满足MAR假设所需的所有数据。