Karagianni Marianna, Tsaousis Ioannis
Department of Psychology, School of Social Sciences, University of Crete, 74100 Rethymno, Greece.
Department of Psychology, National and Kapodistrian University of Athens, 15784 Athens, Greece.
Behav Sci (Basel). 2025 Feb 25;15(3):268. doi: 10.3390/bs15030268.
The goal of the present study is to describe the methods used to assess the effectiveness and psychometric properties of Numetrive, a newly developed computerized adaptive testing system that measures numerical reasoning. For this purpose, an item bank was developed consisting of 174 items concurrently equated and calibrated using the two-parameter logistic model (2PLM), with item difficulties ranging between -3.4 and 2.7 and discriminations spanning from 0.51 up to 1.6. Numetrive constitutes an algorithmic combination that includes maximum likelihood estimation with fences (MLEF) for estimation, progressive restricted standard error (PRSE) for item selection and exposure control, and standard error of estimation as the termination rule. The newly developed CAT was evaluated in a Monte Carlo simulation study and was found to perform highly efficiently. The study demonstrated that on average 13.6 items were administered to 5000 simulees while the exposure rates remained significantly low. Additionally, the accuracy in determining the ability scores of the participants was exceptionally high as indicated by various statistical indices, including the bias statistic, mean absolute error (MAE), and root mean square error (RMSE). Finally, a validity study was performed, aimed at evaluating concurrent, convergent, and divergent validity of the newly developed CAT system. Findings verified Numertive's robustness and applicability in the evaluation of numerical reasoning.
本研究的目的是描述用于评估Numetrive有效性和心理测量特性的方法。Numetrive是一种新开发的用于测量数值推理的计算机自适应测试系统。为此,开发了一个题库,其中包含174个项目,使用双参数逻辑模型(2PLM)进行了同时等值和校准,项目难度在-3.4至2.7之间,区分度在0.51至1.6之间。Numetrive是一种算法组合,包括用于估计的带边界的最大似然估计(MLEF)、用于项目选择和曝光控制的渐进受限标准误差(PRSE),以及将估计标准误差作为终止规则。在一项蒙特卡洛模拟研究中对新开发的计算机自适应测试(CAT)进行了评估,发现其执行效率很高。该研究表明,平均向5000名模拟受试者施测了13.6个项目,同时曝光率仍然很低。此外,各种统计指标(包括偏差统计量、平均绝对误差(MAE)和均方根误差(RMSE))表明,确定参与者能力得分的准确性非常高。最后,进行了一项效度研究,旨在评估新开发的CAT系统的同时效度、收敛效度和区分效度。研究结果验证了Numertive在数值推理评估中的稳健性和适用性。