用于技术增强创新型项目的多分类测试题组反应模型：对模型拟合和特质推断的影响

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference.

作者信息

Kang Hyeon-Ah, Han Suhwa, Kim Doyoung, Kao Shu-Chuan

机构信息

University of Texas at Austin, Austin, TX, USA.

National Council of State Boards of Nursing, Chicago, IL, USA.

出版信息

Educ Psychol Meas. 2022 Aug;82(4):811-838. doi: 10.1177/00131644211032261. Epub 2021 Aug 2.

DOI:10.1177/00131644211032261

PMID:35754615

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9228694/

Abstract

The development of technology-enhanced innovative items calls for practical models that can describe polytomous testlet items. In this study, we evaluate four measurement models that can characterize polytomous items administered in testlets: (a) generalized partial credit model (GPCM), (b) testlet-as-a-polytomous-item model (TPIM), (c) random-effect testlet model (RTM), and (d) fixed-effect testlet model (FTM). Using data from GPCM, FTM, and RTM, we examine performance of the scoring models in multiple aspects: relative model fit, absolute item fit, significance of testlet effects, parameter recovery, and classification accuracy. The empirical analysis suggests that relative performance of the models varies substantially depending on the testlet-effect type, effect size, and trait estimator. When testlets had no or fixed effects, GPCM and FTM led to most desirable measurement outcomes. When testlets had random interaction effects, RTM demonstrated best model fit and yet showed substantially different performance in the trait recovery depending on the estimator. In particular, the advantage of RTM as a scoring model was discernable only when there existed strong random effects and the trait levels were estimated with Bayes priors. In other settings, the simpler models (i.e., GPCM, FTM) performed better or comparably. The study also revealed that polytomous scoring of testlet items has limited prospect as a functional scoring method. Based on the outcomes of the empirical evaluation, we provide practical guidelines for choosing a measurement model for polytomous innovative items that are administered in testlets.

摘要

技术增强型创新项目的发展需要能够描述多分类题组项目的实用模型。在本研究中，我们评估了四种能够刻画题组中多分类项目的测量模型：（a）广义部分计分模型（GPCM），（b）题组作为多分类项目模型（TPIM），（c）随机效应题组模型（RTM），以及（d）固定效应题组模型（FTM）。利用来自GPCM、FTM和RTM的数据，我们从多个方面检验了计分模型的表现：相对模型拟合度、绝对项目拟合度、题组效应的显著性、参数恢复以及分类准确性。实证分析表明，模型的相对表现会因题组效应类型、效应大小和特质估计方法的不同而有很大差异。当题组没有效应或有固定效应时，GPCM和FTM能带来最理想的测量结果。当题组有随机交互效应时，RTM表现出最佳的模型拟合度，但根据估计方法的不同，其在特质恢复方面的表现有很大差异。特别是，只有当存在强烈的随机效应且特质水平采用贝叶斯先验估计时，RTM作为计分模型的优势才明显。在其他情况下，更简单的模型（即GPCM、FTM）表现更好或相当。该研究还表明，题组项目的多分类计分作为一种功能计分方法前景有限。基于实证评估的结果，我们为选择用于题组中多分类创新项目的测量模型提供了实用指南。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于技术增强创新型项目的多分类测试题组反应模型：对模型拟合和特质推断的影响

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

相似文献

引用本文的文献

本文引用的文献

用于技术增强创新型项目的多分类测试题组反应模型：对模型拟合和特质推断的影响

Polytomous Testlet Response Models for Technology-Enhanced Innovative Items: Implications on Model Fit and Trait Inference.

作者信息

机构信息

出版信息