从开发到验证：探索Numetrive的效率，一种数字推理的计算机自适应评估。

From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning.

作者信息

Karagianni Marianna, Tsaousis Ioannis

机构信息

Department of Psychology, School of Social Sciences, University of Crete, 74100 Rethymno, Greece.

Department of Psychology, National and Kapodistrian University of Athens, 15784 Athens, Greece.

出版信息

Behav Sci (Basel). 2025 Feb 25;15(3):268. doi: 10.3390/bs15030268.

DOI:10.3390/bs15030268

PMID:40150163

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11939369/

Abstract

The goal of the present study is to describe the methods used to assess the effectiveness and psychometric properties of Numetrive, a newly developed computerized adaptive testing system that measures numerical reasoning. For this purpose, an item bank was developed consisting of 174 items concurrently equated and calibrated using the two-parameter logistic model (2PLM), with item difficulties ranging between -3.4 and 2.7 and discriminations spanning from 0.51 up to 1.6. Numetrive constitutes an algorithmic combination that includes maximum likelihood estimation with fences (MLEF) for estimation, progressive restricted standard error (PRSE) for item selection and exposure control, and standard error of estimation as the termination rule. The newly developed CAT was evaluated in a Monte Carlo simulation study and was found to perform highly efficiently. The study demonstrated that on average 13.6 items were administered to 5000 simulees while the exposure rates remained significantly low. Additionally, the accuracy in determining the ability scores of the participants was exceptionally high as indicated by various statistical indices, including the bias statistic, mean absolute error (MAE), and root mean square error (RMSE). Finally, a validity study was performed, aimed at evaluating concurrent, convergent, and divergent validity of the newly developed CAT system. Findings verified Numertive's robustness and applicability in the evaluation of numerical reasoning.

摘要

本研究的目的是描述用于评估Numetrive有效性和心理测量特性的方法。Numetrive是一种新开发的用于测量数值推理的计算机自适应测试系统。为此，开发了一个题库，其中包含174个项目，使用双参数逻辑模型（2PLM）进行了同时等值和校准，项目难度在-3.4至2.7之间，区分度在0.51至1.6之间。Numetrive是一种算法组合，包括用于估计的带边界的最大似然估计（MLEF）、用于项目选择和曝光控制的渐进受限标准误差（PRSE），以及将估计标准误差作为终止规则。在一项蒙特卡洛模拟研究中对新开发的计算机自适应测试（CAT）进行了评估，发现其执行效率很高。该研究表明，平均向5000名模拟受试者施测了13.6个项目，同时曝光率仍然很低。此外，各种统计指标（包括偏差统计量、平均绝对误差（MAE）和均方根误差（RMSE））表明，确定参与者能力得分的准确性非常高。最后，进行了一项效度研究，旨在评估新开发的CAT系统的同时效度、收敛效度和区分效度。研究结果验证了Numertive在数值推理评估中的稳健性和适用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/958b/11939369/0f0c8afe4e9e/behavsci-15-00268-g001.jpg

相似文献

From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning.从开发到验证：探索Numetrive的效率，一种数字推理的计算机自适应评估。

Behav Sci (Basel). 2025 Feb 25;15(3):268. doi: 10.3390/bs15030268.

Robustness of Adaptive Measurement of Change to Item Parameter Estimation Error.变化的自适应测量对项目参数估计误差的稳健性。

Educ Psychol Meas. 2022 Aug;82(4):643-677. doi: 10.1177/00131644211033902. Epub 2021 Aug 16.

Comparative performance of PROMIS Sleep Disturbance computerized adaptive testing algorithms and static short form in postmenopausal women.绝经后女性中PROMIS睡眠障碍计算机自适应测试算法与静态简表的比较性能

J Patient Rep Outcomes. 2025 Feb 17;9(1):18. doi: 10.1186/s41687-025-00849-6.

Psychometric Assessment of an Item Bank for Adaptive Testing on Patient-Reported Experience of Care Environment for Severe Mental Illness: Validation Study.用于严重精神疾病患者报告的护理环境体验的自适应测试的项目库的心理计量学评估：验证研究。

JMIR Ment Health. 2024 May 16;11:e49916. doi: 10.2196/49916.

Measuring glaucoma quality of life in an Asian population using item banking: psychometric evaluation and computerized adaptive testing simulations.使用项目银行衡量亚洲人群的青光眼生活质量：心理测量学评估和计算机化自适应测试模拟。

Qual Life Res. 2023 Sep;32(9):2667-2679. doi: 10.1007/s11136-023-03428-8. Epub 2023 Apr 28.

Developing Computerized Adaptive Testing for a National Health Professionals Exam: An Attempt from Psychometric Simulations.开发全国卫生专业人员考试的计算机化自适应测验：心理计量学模拟的尝试。

Perspect Med Educ. 2023 Oct 31;12(1):462-471. doi: 10.5334/pme.855. eCollection 2023.

Development of a Computerized Adaptive Testing for Internet Addiction.网络成瘾计算机自适应测试的开发。

Front Psychol. 2019 May 7;10:1010. doi: 10.3389/fpsyg.2019.01010. eCollection 2019.

Accounting for item calibration error in computerized adaptive testing.计算机自适应测试中项目校准误差的核算。

Behav Res Methods. 2025 Mar 26;57(5):126. doi: 10.3758/s13428-025-02649-8.

Developing a Computerized Adaptive Test to Assess Stress in Chinese College Students.开发一种计算机自适应测试以评估中国大学生的压力状况。

Front Psychol. 2020 Feb 7;11:7. doi: 10.3389/fpsyg.2020.00007. eCollection 2020.

Computerized adaptive testing with decision regression trees: an alternative to item response theory for quality of life measurement in multiple sclerosis.基于决策回归树的计算机自适应测试：多发性硬化症生活质量测量中项目反应理论的替代方法

Patient Prefer Adherence. 2018 Jun 19;12:1043-1053. doi: 10.2147/PPA.S162206. eCollection 2018.

本文引用的文献

Construction of a computerized adaptive test (CAT-CCNB) for efficient neurocognitive and clinical psychopathology assessment.构建用于高效神经认知和临床精神病理学评估的计算机化自适应测试（CAT-CCNB）。

J Neurosci Methods. 2023 Feb 15;386:109795. doi: 10.1016/j.jneumeth.2023.109795. Epub 2023 Jan 16.

Numerosity sense correlates with fluent mathematical abilities.数量感与流畅的数学能力相关。

Acta Psychol (Amst). 2022 Aug;228:103655. doi: 10.1016/j.actpsy.2022.103655. Epub 2022 Jun 27.

An Investigation of Exposure Control Methods With Variable-Length CAT Using the Partial Credit Model.使用部分计分模型对可变长度计算机自适应测试的暴露控制方法进行的调查。

Appl Psychol Meas. 2019 Nov;43(8):624-638. doi: 10.1177/0146621618824856. Epub 2019 Jan 23.

Conducting simulation studies for computerized adaptive testing using SimulCAT: an instructional piece.使用SimulCAT进行计算机自适应测试的模拟研究：一篇指导性文章。

J Educ Eval Health Prof. 2018;15:20. doi: 10.3352/jeehp.2018.15.20. Epub 2018 Aug 17.

Comparing computer adaptive testing stopping rules under the generalized partial-credit model.比较广义部分信用模型下的计算机自适应测试停止规则。

Behav Res Methods. 2019 Jun;51(3):1305-1320. doi: 10.3758/s13428-018-1068-x.

Maximum Likelihood Score Estimation Method With Fences for Short-Length Tests and Computerized Adaptive Tests.用于短长度测试和计算机自适应测试的带边界的最大似然分数估计方法

Appl Psychol Meas. 2016 Jun;40(4):289-301. doi: 10.1177/0146621616631317. Epub 2016 Feb 15.

Components of the item selection algorithm in computerized adaptive testing.计算机自适应测试中项目选择算法的组成部分。

J Educ Eval Health Prof. 2018 Mar 24;15:7. doi: 10.3352/jeehp.2018.15.7. eCollection 2018.

Overview and current management of computerized adaptive testing in licensing/certification examinations.执照/认证考试中计算机自适应测试的概述与当前管理

J Educ Eval Health Prof. 2017 Jul 26;14:17. doi: 10.3352/jeehp.2017.14.17. eCollection 2017.

Adaptive testing for psychological assessment: how many items are enough to run an adaptive testing algorithm?心理评估的自适应测试：运行自适应测试算法需要多少个项目？

J Appl Meas. 2013;14(2):106-17.

A New Stopping Rule for Computerized Adaptive Testing.一种用于计算机自适应测试的新停止规则。

Educ Psychol Meas. 2010 Dec 1;70(6):1-17. doi: 10.1177/0013164410387338.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

从开发到验证：探索Numetrive的效率，一种数字推理的计算机自适应评估。

From Development to Validation: Exploring the Efficiency of Numetrive, a Computerized Adaptive Assessment of Numerical Reasoning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献