Tian Chen, Choi Jaehwa
Department of Human Development and Quantitative Methodology, University of Maryland, College Park, MD, USA.
Department of Educational Leadership, The George Washington University, Washington, DC, USA.
Appl Psychol Meas. 2023 Jun;47(4):275-290. doi: 10.1177/01466216231165313. Epub 2023 Mar 17.
Sibling items developed through automatic item generation share similar but not identical psychometric properties. However, considering sibling item variations may bring huge computation difficulties and little improvement on scoring. Assuming identical characteristics among siblings, this study explores the impact of item model parameter variations (i.e., within-family variation between siblings) on person parameter estimation in linear tests and Computerized Adaptive Testing (CAT). Specifically, we explore (1) what if small/medium/large within-family variance is ignored, (2) if the effect of larger within-model variance can be compensated by greater test length, (3) if the item model pool properties affect the impact of within-family variance on scoring, and (4) if the issues in (1) and (2) are different in linear vs. adaptive testing. Related sibling model is used for data generation and identical sibling model is assumed for scoring. Manipulated factors include test length, the size of within-model variation, and item model pool characteristics. Results show that as within-family variance increases, the standard error of scores remains at similar levels. For correlations between true and estimated score and RMSE, the effect of the larger within-model variance was compensated by test length. For bias, scores are biased towards the center, and bias was not compensated by test length. Despite the within-family variation is random in current simulations, to yield less biased ability estimates, the item model pool should provide balanced opportunities such that "fake-easy" and "fake-difficult" item instances cancel their effects. The results of CAT are similar to that of linear tests, except for higher efficiency.
通过自动项目生成开发的同类项目具有相似但不完全相同的心理测量属性。然而,考虑同类项目的差异可能会带来巨大的计算困难,并且对评分的提升很小。假设同类项目具有相同的特征,本研究探讨了项目模型参数变化(即同类项目之间的家庭内部差异)对线性测试和计算机自适应测试(CAT)中人员参数估计的影响。具体而言,我们探讨了:(1)如果忽略小/中/大的家庭内部方差会怎样;(2)模型内部较大的方差影响是否可以通过更长的测试长度来补偿;(3)项目模型库属性是否会影响家庭内部方差对评分的影响;以及(4)(1)和(2)中的问题在线性测试与自适应测试中是否不同。相关同类项目模型用于数据生成,评分时假设同类项目模型相同。操纵因素包括测试长度、模型内部变化的大小以及项目模型库特征。结果表明,随着家庭内部方差的增加,分数的标准误差保持在相似水平。对于真实分数与估计分数之间的相关性以及均方根误差(RMSE),模型内部较大方差的影响通过测试长度得到了补偿。对于偏差,分数偏向中心,并且偏差没有通过测试长度得到补偿。尽管在当前模拟中家庭内部差异是随机的,但为了产生偏差较小的能力估计,项目模型库应提供平衡的机会,以使“假容易”和“假困难”的项目实例抵消它们的影响。除了效率更高之外,计算机自适应测试的结果与线性测试的结果相似。