Suppr超能文献

用于技术增强创新型项目的多分类测试题组反应模型:对模型拟合和特质推断的影响

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference.

作者信息

Kang Hyeon-Ah, Han Suhwa, Kim Doyoung, Kao Shu-Chuan

机构信息

University of Texas at Austin, Austin, TX, USA.

National Council of State Boards of Nursing, Chicago, IL, USA.

出版信息

Educ Psychol Meas. 2022 Aug;82(4):811-838. doi: 10.1177/00131644211032261. Epub 2021 Aug 2.

Abstract

The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.

摘要

技术增强型创新项目的发展需要能够描述多分类题组项目的实用模型。在本研究中,我们评估了四种能够刻画题组中多分类项目的测量模型:(a)广义部分计分模型(GPCM),(b)题组作为多分类项目模型(TPIM),(c)随机效应题组模型(RTM),以及(d)固定效应题组模型(FTM)。利用来自GPCM、FTM和RTM的数据,我们从多个方面检验了计分模型的表现:相对模型拟合度、绝对项目拟合度、题组效应的显著性、参数恢复以及分类准确性。实证分析表明,模型的相对表现会因题组效应类型、效应大小和特质估计方法的不同而有很大差异。当题组没有效应或有固定效应时,GPCM和FTM能带来最理想的测量结果。当题组有随机交互效应时,RTM表现出最佳的模型拟合度,但根据估计方法的不同,其在特质恢复方面的表现有很大差异。特别是,只有当存在强烈的随机效应且特质水平采用贝叶斯先验估计时,RTM作为计分模型的优势才明显。在其他情况下,更简单的模型(即GPCM、FTM)表现更好或相当。该研究还表明,题组项目的多分类计分作为一种功能计分方法前景有限。基于实证评估的结果,我们为选择用于题组中多分类创新项目的测量模型提供了实用指南。

相似文献

2
Computerized adaptive testing for testlet-based innovative items.基于测试单元的创新项目的计算机化自适应测试。
Br J Math Stat Psychol. 2022 Feb;75(1):136-157. doi: 10.1111/bmsp.12252. Epub 2021 Aug 30.
7
Testlet-Based Multidimensional Adaptive Testing.基于测试集的多维自适应测试。
Front Psychol. 2016 Nov 18;7:1758. doi: 10.3389/fpsyg.2016.01758. eCollection 2016.
8
F-type testlets and the effects of feedback and case-specificity.F 型测试单元以及反馈和案例特异性的影响。
Acad Med. 2011 Oct;86(10 Suppl):S55-8; quiz S58. doi: 10.1097/ACM.0b013e31822a6aa2.
9
Modeling Rapid Guessing Behaviors in Computer-Based Testlet Items.基于计算机的分块试题中快速猜测行为的建模
Appl Psychol Meas. 2023 Jan;47(1):19-33. doi: 10.1177/01466216221125177. Epub 2022 Sep 9.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验