Suppr超能文献

基于测试单元的创新项目的计算机化自适应测试。

Computerized adaptive testing for testlet-based innovative items.

机构信息

University of Texas at Austin, Texas, USA.

National Council of State Boards of Nursing, Chicago, Illinois, USA.

出版信息

Br J Math Stat Psychol. 2022 Feb;75(1):136-157. doi: 10.1111/bmsp.12252. Epub 2021 Aug 30.

Abstract

Increasing use of innovative items in operational assessments has shedded new light on the polytomous testlet models. In this study, we examine performance of several scoring models when polytomous items exhibit random testlet effects. Four models are considered for investigation: the partial credit model (PCM), testlet-as-a-polytomous-item model (TPIM), random-effect testlet model (RTM), and fixed-effect testlet model (FTM). The performance of the models was evaluated in two adaptive testings where testlets have nonzero random effects. The outcomes of the study suggest that, despite the manifest random testlet effects, PCM, FTM, and RTM perform comparably in trait recovery and examinee classification. The overall accuracy of PCM and FTM in trait inference was comparable to that of RTM. TPIM consistently underestimated population variance and led to significant overestimation of measurement precision, showing limited utility for operational use. The results of the study provide practical implications for using the polytomous testlet scoring models.

摘要

在操作评估中越来越多地使用创新项目,为多项测试模型提供了新的启示。在这项研究中,我们考察了当多项测试呈现随机测试效应时,几种评分模型的表现。我们考虑了四个模型进行研究:部分信用模型(PCM)、测试集作为多项测试模型(TPIM)、随机效应测试集模型(RTM)和固定效应测试集模型(FTM)。在测试集具有非零随机效应的两种自适应测试中评估了模型的性能。研究结果表明,尽管存在明显的随机测试效应,但是在特质恢复和考生分类方面,PCM、FTM 和 RTM 的表现相当。PCM 和 FTM 在特质推断中的整体准确性与 RTM 相当。TPIM 一直低估总体方差,导致测量精度的显著高估,显示出在实际使用中的有限效用。研究结果为使用多项测试集评分模型提供了实际意义。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验