Jiang Zhehan, Han Yuting, Xu Lingling, Shi Dexin, Liu Ren, Ouyang Jinying, Cai Fen
Peking University Health Science Center, Beijing, China.
University of South Carolina, Columbia, USA.
Educ Psychol Meas. 2023 Oct;83(5):984-1006. doi: 10.1177/00131644221120899. Epub 2022 Sep 4.
The part of responses that is absent in the nonequivalent groups with anchor test (NEAT) design can be managed to a planned missing scenario. In the context of small sample sizes, we present a machine learning (ML)-based imputation technique called chaining random forests (CRF) to perform equating tasks within the NEAT design. Specifically, seven CRF-based imputation equating methods are proposed based on different data augmentation methods. The equating performance of the proposed methods is examined through a simulation study. Five factors are considered: (a) test length (20, 30, 40, 50), (b) sample size per test form (50 versus 100), (c) ratio of common/anchor items (0.2 versus 0.3), and (d) equivalent versus nonequivalent groups taking the two forms (no mean difference versus a mean difference of 0.5), and (e) three different types of anchors (random, easy, and hard), resulting in 96 conditions. In addition, five traditional equating methods, (1) Tucker method; (2) Levine observed score method; (3) equipercentile equating method; (4) circle-arc method; and (5) concurrent calibration based on Rasch model, were also considered, plus seven CRF-based imputation equating methods for a total of 12 methods in this study. The findings suggest that benefiting from the advantages of ML techniques, CRF-based methods that incorporate the equating result of the Tucker method, such as IMP_total_Tucker, IMP_pair_Tucker, and IMP_Tucker_cirlce methods, can yield more robust and trustable estimates for the "missingness" in an equating task and therefore result in more accurate equated scores than other counterparts in short-length tests with small samples.
在具有锚定测试的非等效组(NEAT)设计中缺失的那部分反应,可以被处理为一个计划好的缺失情景。在小样本量的情况下,我们提出了一种基于机器学习(ML)的插补技术,称为链式随机森林(CRF),以在NEAT设计中执行等值任务。具体而言,基于不同的数据增强方法,提出了七种基于CRF的插补等值方法。通过模拟研究检验了所提出方法的等值性能。考虑了五个因素:(a)测试长度(20、30、40、50),(b)每个测试形式的样本量(50对100),(c)共同/锚定项目的比例(0.2对0.3),以及(d)采用两种形式的等效组与非等效组(无均值差异对均值差异为0.5),和(e)三种不同类型的锚定(随机、容易和困难),从而产生96种情况。此外,还考虑了五种传统的等值方法,(1)塔克方法;(2)莱文观察分数方法;(3)等百分位等值方法;(4)圆弧方法;和(5)基于拉施模型的并发校准,加上七种基于CRF的插补等值方法,本研究总共12种方法。研究结果表明,受益于ML技术的优势,结合塔克方法等值结果的基于CRF的方法,如IMP_total_Tucker、IMP_pair_Tucker和IMP_Tucker_cirlce方法,对于等值任务中的“缺失值”可以产生更稳健和可信的估计,因此在小样本的短长度测试中,比其他同类方法能得出更准确的等值分数。