使用拉施模型对项目功能差异进行建模时能力差异与猜测的相互作用：传统校准与定制校准

The Interaction of Ability Differences and Guessing When Modeling Differential Item Functioning With the Rasch Model: Conventional and Tailored Calibration.

作者信息

DeMars Christine E, Jurich Daniel P

机构信息

James Madison University, Harrisonburg, VA, USA.

National Board of Medical Examiners, Philadelphia, PA, USA.

出版信息

Educ Psychol Meas. 2015 Aug;75(4):610-633. doi: 10.1177/0013164414554082. Epub 2014 Oct 20.

DOI:10.1177/0013164414554082

PMID:29795835

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5965617/

Abstract

In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data follow a three-parameter logistic model. With large group ability differences, difficult non-DIF items appeared to favor the focal group and easy non-DIF items appeared to favor the reference group. Correspondingly, the effect sizes for DIF items were biased. These effects were mitigated when data were coded as missing for item-examinee encounters in which the person measure was considerably lower than the item location. Explanation of these results is provided by illustrating how the item response function becomes differentially distorted by guessing depending on the groups' ability distributions. In terms of practical implications, results suggest that measurement practitioners should not trust the DIF estimates from the Rasch model when there is a large difference in ability and examinees are potentially able to answer items correctly by guessing, unless data from examinees poorly matched to the item difficulty are coded as missing.

摘要

在教育测试中，必须准确估计项目功能差异（DIF）统计量，以确保标记出合适的项目进行检查或剔除。本研究表明，当能力（影响）存在较大组间差异且数据遵循三参数逻辑模型时，使用拉施模型估计DIF可能会在结果中引入相当大的偏差。在能力存在较大组间差异的情况下，难度较大的非DIF项目似乎有利于目标组，而容易的非DIF项目似乎有利于参照组。相应地，DIF项目的效应大小存在偏差。当将考生能力远低于项目难度水平的项目-考生作答情况编码为缺失数据时，这些影响会得到缓解。通过说明项目反应函数如何根据组间能力分布因猜测而产生不同程度的扭曲，对这些结果进行了解释。就实际意义而言，研究结果表明，当能力存在较大差异且考生有可能通过猜测正确回答项目时，测量从业者不应相信拉施模型得出的DIF估计值，除非将与项目难度匹配度较差的考生数据编码为缺失数据。

相似文献

The Interaction of Ability Differences and Guessing When Modeling Differential Item Functioning With the Rasch Model: Conventional and Tailored Calibration.使用拉施模型对项目功能差异进行建模时能力差异与猜测的相互作用：传统校准与定制校准

Educ Psychol Meas. 2015 Aug;75(4):610-633. doi: 10.1177/0013164414554082. Epub 2014 Oct 20.

Assessing DIF among small samples with separate calibration t and Mantel-Haenszel χ² statistics in the Rasch model.在Rasch模型中，使用单独校准的t统计量和Mantel-Haenszel卡方统计量评估小样本中的差异性项目功能。

J Appl Meas. 2013;14(4):389-99.

Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis.忽略猜测效应在测量不变性分析中的后果。

Appl Psychol Meas. 2021 Jun;45(4):283-296. doi: 10.1177/01466216211013915. Epub 2021 May 17.

Is the Patient Activation Measure a valid measure of osteoarthritis self-management attitudes and capabilities? Results of a Rasch analysis.患者激活度量表（PAM）是否能有效测量骨关节炎自我管理的态度和能力？一项 Rasch 分析的结果。

Health Qual Life Outcomes. 2020 May 5;18(1):121. doi: 10.1186/s12955-020-01364-6.

Recent advances in analysis of differential item functioning in health research using the Rasch model.使用拉施模型分析健康研究中项目功能差异的最新进展。

Health Qual Life Outcomes. 2017 Sep 19;15(1):181. doi: 10.1186/s12955-017-0755-0.

A New Stopping Criterion for Rasch Trees Based on the Mantel-Haenszel Effect Size Measure for Differential Item Functioning.一种基于用于项目功能差异的曼特尔-亨塞尔效应量度量的拉施树新停止准则。

Educ Psychol Meas. 2023 Feb;83(1):181-212. doi: 10.1177/00131644221077135. Epub 2022 Feb 28.

Explaining differential item functioning focusing on the crucial role of external information - an example from the measurement of adolescent mental health.解释关注外部信息的关键作用的差异项目功能——以青少年心理健康测量为例。

BMC Med Res Methodol. 2019 Sep 5;19(1):185. doi: 10.1186/s12874-019-0828-3.

Using Rasch Analysis to Assess and Improve the Measurement Properties of a Questionnaire With Few Items: The York Binaural Hearing-Related Quality of Life (YBHRQL) Questionnaire.使用拉施分析评估和改进条目较少的问卷的测量属性：约克双耳听力相关生活质量（YBHRQL）问卷。

Ear Hear. 2023;44(6):1526-1539. doi: 10.1097/AUD.0000000000001400. Epub 2023 Jun 26.

Modern psychometric methods for detection of differential item functioning: application to cognitive assessment measures.用于检测项目功能差异的现代心理测量方法：在认知评估测量中的应用。

Stat Med. 2000;19(11-12):1651-83. doi: 10.1002/(sici)1097-0258(20000615/30)19:11/12<1651::aid-sim453>3.0.co;2-h.

Using a Multidimensional IRT Framework to Better Understand Differential Item Functioning (DIF): A Tale of Three DIF Detection Procedures.使用多维项目反应理论框架以更好地理解项目功能差异（DIF）：三个DIF检测程序的故事

Educ Psychol Meas. 2017 Dec;77(6):945-970. doi: 10.1177/0013164416657137. Epub 2016 Jul 11.

引用本文的文献

Consequences of Ignoring Guessing Effects on Measurement Invariance Analysis.忽略猜测效应在测量不变性分析中的后果。

Appl Psychol Meas. 2021 Jun;45(4):283-296. doi: 10.1177/01466216211013915. Epub 2021 May 17.

Investigating Measurement Invariance by Means of Parameter Instability Tests for 2PL and 3PL Models.通过两参数逻辑斯蒂模型和三参数逻辑斯蒂模型的参数稳定性检验来研究测量不变性

Educ Psychol Meas. 2019 Apr;79(2):385-398. doi: 10.1177/0013164418777784. Epub 2018 May 24.

An Evaluation of Overall Goodness-of-Fit Tests for the Rasch Model.拉施模型整体拟合优度检验的评估

Front Psychol. 2019 Jan 10;9:2710. doi: 10.3389/fpsyg.2018.02710. eCollection 2018.

All metrics are equal, but some metrics are more equal than others: A systematic search and review on the use of the term 'metric'.所有指标都是平等的，但有些指标比其他指标更平等：对术语“指标”使用的系统搜索和审查。

PLoS One. 2018 Mar 6;13(3):e0193861. doi: 10.1371/journal.pone.0193861. eCollection 2018.

本文引用的文献

How Item Residual Heterogeneity Affects Tests for Differential Item Functioning.项目残差异质性如何影响项目功能差异检验。

Appl Psychol Meas. 2015 Jun;39(4):251-263. doi: 10.1177/0146621614561313. Epub 2014 Dec 11.

Real and Artificial Differential Item Functioning in Polytomous Items.多分类项目中的真实和人为差异项目功能

Educ Psychol Meas. 2015 Apr;75(2):185-207. doi: 10.1177/0013164414534258. Epub 2014 May 16.

Assessment of differential item functioning.差异项目功能评估。

J Appl Meas. 2008;9(4):387-408.

Optimizing rating scale category effectiveness.优化评定量表类别有效性。

J Appl Meas. 2002;3(1):85-106.

Using item mean squares to evaluate fit to the Rasch model.使用项目均方来评估对拉施模型的拟合度。

J Outcome Meas. 1998;2(1):66-78.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验