具有评分者效应的考生自选项目的项目反应理论建模

Item Response Theory Modeling for Examinee-selected Items with Rater Effect.

作者信息

Liu Chen-Wei, Qiu Xue-Lan, Wang Wen-Chung

机构信息

The Chinese University of Hong Kong, Sha Tin, Hong Kong.

The Education University of Hong Kong, Tai Po, Hong Kong.

出版信息

Appl Psychol Meas. 2019 Sep;43(6):435-448. doi: 10.1177/0146621618798667. Epub 2018 Oct 8.

DOI:10.1177/0146621618798667

PMID:31452553

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6696873/

Abstract

Some large-scale testing requires examinees to select and answer a fixed number of items from given items (e.g., select one out of the three items). Usually, they are constructed-response items that are marked by human raters. In this examinee-selected item (ESI) design, some examinees may benefit more than others from choosing easier items to answer, and so the missing data induced by the design become missing not at random (MNAR). Although item response theory (IRT) models have recently been developed to account for MNAR data in the ESI design, they do not consider the rater effect; thus, their utility is seriously restricted. In this study, two methods are developed: the first one is a new IRT model to account for both MNAR data and rater severity simultaneously, and the second one adapts conditional maximum likelihood estimation and pairwise estimation methods to the ESI design with the rater effect. A series of simulations was then conducted to compare their performance with those of conventional IRT models that ignored MNAR data or rater severity. The results indicated a good parameter recovery for the new model. The conditional maximum likelihood estimation and pairwise estimation methods were applicable when the Rasch models fit the data, but the conventional IRT models yielded biased parameter estimates. An empirical example was given to illustrate these new initiatives.

摘要

一些大规模测试要求考生从给定的题目中选择并回答固定数量的题目（例如，从三个题目中选择一个）。通常，这些题目是建构反应式题目，由人工评分员打分。在这种考生选择题目（ESI）设计中，一些考生可能比其他考生更受益于选择更容易的题目来回答，因此这种设计导致的缺失数据成为非随机缺失（MNAR）。尽管最近已经开发了项目反应理论（IRT）模型来处理ESI设计中的MNAR数据，但它们没有考虑评分员效应；因此，它们的效用受到严重限制。在本研究中，开发了两种方法：第一种是一种新的IRT模型，用于同时处理MNAR数据和评分员的严格程度，第二种是将条件最大似然估计和成对估计方法应用于具有评分员效应的ESI设计。然后进行了一系列模拟，以将它们的性能与忽略MNAR数据或评分员严格程度的传统IRT模型的性能进行比较。结果表明新模型具有良好的参数恢复能力。当Rasch模型拟合数据时，条件最大似然估计和成对估计方法适用，但传统IRT模型产生有偏差的参数估计。给出了一个实证例子来说明这些新方法。

相似文献

Item Response Theory Modeling for Examinee-selected Items with Rater Effect.具有评分者效应的考生自选项目的项目反应理论建模

Appl Psychol Meas. 2019 Sep;43(6):435-448. doi: 10.1177/0146621618798667. Epub 2018 Oct 8.

Non-ignorable missingness item response theory models for choice effects in examinee-selected items.用于考生选择项目中选择效应的非忽略缺失项目反应理论模型。

Br J Math Stat Psychol. 2017 Nov;70(3):499-524. doi: 10.1111/bmsp.12097. Epub 2017 Apr 8.

Investigating the Impact of Item Parameter Drift for Item Response Theory Models with Mixture Distributions.研究项目参数漂移对具有混合分布的项目反应理论模型的影响。

Front Psychol. 2016 Feb 24;7:255. doi: 10.3389/fpsyg.2016.00255. eCollection 2016.

Item response theory model highlighting rating scale of a rubric and rater-rubric interaction in objective structured clinical examination.项目反应理论模型突出了客观结构化临床考试中等级量表的评分和评分者-等级量表的交互作用。

PLoS One. 2024 Sep 6;19(9):e0309887. doi: 10.1371/journal.pone.0309887. eCollection 2024.

Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs.在不完全评分设计中使用重复评分提高测量精度

J Appl Meas. 2018;19(2):148-161.

A new item response theory model to adjust data allowing examinee choice.一种用于调整数据以允许考生选择的新的项目反应理论模型。

PLoS One. 2018 Feb 1;13(2):e0191600. doi: 10.1371/journal.pone.0191600. eCollection 2018.

Linking essay-writing tests using many-facet models and neural automated essay scoring.运用多维模型和神经自动作文评分技术对作文考试进行关联。

Behav Res Methods. 2024 Dec;56(8):8450-8479. doi: 10.3758/s13428-024-02485-2. Epub 2024 Aug 20.

A General Unfolding IRT Model for Multiple Response Styles.一种适用于多种反应风格的通用展开IRT模型。

Appl Psychol Meas. 2019 May;43(3):195-210. doi: 10.1177/0146621618762743. Epub 2018 Apr 16.

The Interaction of Ability Differences and Guessing When Modeling Differential Item Functioning With the Rasch Model: Conventional and Tailored Calibration.使用拉施模型对项目功能差异进行建模时能力差异与猜测的相互作用：传统校准与定制校准

Educ Psychol Meas. 2015 Aug;75(4):610-633. doi: 10.1177/0013164414554082. Epub 2014 Oct 20.

Item-Weighted Likelihood Method for Measuring Growth in Longitudinal Study With Tests Composed of Both Dichotomous and Polytomous Items.用于在包含二分制和多分制项目的测试的纵向研究中测量增长的项目加权似然法。

Front Psychol. 2021 Jul 27;12:580015. doi: 10.3389/fpsyg.2021.580015. eCollection 2021.

引用本文的文献

Exploring the Influence of Response Styles on Continuous Scale Assessments: Insights From a Novel Modeling Approach.探索反应方式对连续量表评估的影响：来自一种新型建模方法的见解。

Educ Psychol Meas. 2025 Feb;85(1):178-214. doi: 10.1177/00131644241242789. Epub 2024 Apr 17.

Examining Nonnormal Latent Variable Distributions for Non-Ignorable Missing Data.针对不可忽视的缺失数据检验非正态潜在变量分布

Appl Psychol Meas. 2021 May;45(3):159-177. doi: 10.1177/0146621621990753. Epub 2021 Feb 4.

本文引用的文献

A new item response theory model to adjust data allowing examinee choice.一种用于调整数据以允许考生选择的新的项目反应理论模型。

PLoS One. 2018 Feb 1;13(2):e0191600. doi: 10.1371/journal.pone.0191600. eCollection 2018.

Non-ignorable missingness item response theory models for choice effects in examinee-selected items.用于考生选择项目中选择效应的非忽略缺失项目反应理论模型。

Br J Math Stat Psychol. 2017 Nov;70(3):499-524. doi: 10.1111/bmsp.12097. Epub 2017 Apr 8.

A Hierarchical Model for Accuracy and Choice on Standardized Tests.标准化考试中准确性和选择的层次模型。

Psychometrika. 2015 Nov 25. doi: 10.1007/s11336-015-9484-7.

Using paired comparison matrices to estimate parameters of the partial credit Rasch measurement model for rater-mediated assessments.使用配对比较矩阵来估计评分者介导评估的部分计分Rasch测量模型的参数。

J Appl Meas. 2009;10(1):30-41.

Detecting and measuring rater effects using many-facet Rasch measurement: part I.使用多面Rasch测量法检测和衡量评分者效应：第一部分。

J Appl Meas. 2003;4(4):386-422.

Constructing rater and task banks for performance assessments.构建用于绩效评估的评分者库和任务库。

J Outcome Meas. 1997;1(1):19-33.

Item bank using sample-free calibration.使用无样本校准的题库。

Nature. 1968 Aug 24;219(5156):870-2. doi: 10.1038/219870a0.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验