一种针对具有二元项目的测试进行评分和等值处理的方法：大规模评估试点

An Approach to Scoring and Equating Tests With Binary Items: Piloting With Large-Scale Assessments.

作者信息

Dimitrov Dimiter M

机构信息

George Mason University, Fairfax, VA, USA.

National Center for Assessment, Riyadh, Saudi Arabia.

出版信息

Educ Psychol Meas. 2016 Dec;76(6):954-975. doi: 10.1177/0013164416631100. Epub 2016 Feb 16.

DOI:10.1177/0013164416631100

PMID:29795895

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5965609/

Abstract

This article describes an approach to test scoring, referred to as (-scoring), for tests with dichotomously scored items. The -scoring uses information from item response theory (IRT) calibration to facilitate computations and interpretations in the context of large-scale assessments. The -score is computed from the examinee's response vector, which is weighted by the expected difficulties (not "easiness") of the test items. The expected difficulty of each item is obtained as an analytic function of its IRT parameters. The -scores are independent of the sample of test-takers as they are based on expected item difficulties. It is shown that the -scale performs a good bit better than the IRT logit scale by criteria of scale intervalness. To equate -scales, it is sufficient to rescale the item parameters, thus avoiding tedious and error-prone procedures of mapping test characteristic curves under the method of IRT true score equating, which is often used in the practice of large-scale testing. The proposed -scaling proved promising under its current piloting with large-scale assessments and the hope is that it can efficiently complement IRT procedures in the practice of large-scale testing in the field of education and psychology.

摘要

本文描述了一种用于二分计分项目测试的计分方法，称为（-计分）。-计分利用项目反应理论（IRT）校准的信息，以便在大规模评估的背景下进行计算和解释。-分数是根据考生的反应向量计算得出的，该向量由测试项目的预期难度（而非“容易程度”）加权。每个项目的预期难度是作为其IRT参数的解析函数获得的。-分数与考生样本无关，因为它们基于预期的项目难度。结果表明，按照量表区间性标准，-量表的表现比IRT对数量表好得多。为了使-量表等值，只需重新调整项目参数，从而避免了在大规模测试实践中经常使用的IRT真分数等值法下绘制测试特征曲线的繁琐且容易出错的程序。在目前与大规模评估的试点中，所提出的-量表显示出前景，希望它能在教育和心理学领域的大规模测试实践中有效地补充IRT程序。

相似文献

An Approach to Scoring and Equating Tests With Binary Items: Piloting With Large-Scale Assessments.一种针对具有二元项目的测试进行评分和等值处理的方法：大规模评估试点

Educ Psychol Meas. 2016 Dec;76(6):954-975. doi: 10.1177/0013164416631100. Epub 2016 Feb 16.

The Delta-Scoring Method of Tests With Binary Items: A Note on True Score Estimation and Equating.具有二元项目的测试的德尔塔评分方法：关于真分数估计和等值的一则注释

Educ Psychol Meas. 2018 Oct;78(5):805-825. doi: 10.1177/0013164417724187. Epub 2017 Aug 4.

Developing Multistage Tests Using -Scoring Method.使用-评分法开发多阶段测试。（你提供的原文中“-Scoring Method”前面似乎少了具体内容，可能会影响更准确的理解和翻译）

Educ Psychol Meas. 2019 Oct;79(5):988-1008. doi: 10.1177/0013164419841428. Epub 2019 Apr 22.

Using item response theory to calibrate the Headache Impact Test (HIT) to the metric of traditional headache scales.运用项目反应理论将头痛影响测试（HIT）校准至传统头痛量表的度量标准。

Qual Life Res. 2003 Dec;12(8):981-1002. doi: 10.1023/a:1026123400242.

Comparison of unweighted and item response theory-based weighted sum scoring for the Nine-Questions Depression-Rating Scale in the Northern Thai Dialect.基于未加权和项目反应理论加权和评分的九问抑郁评定量表在泰北方言中的比较。

BMC Med Res Methodol. 2022 Oct 12;22(1):268. doi: 10.1186/s12874-022-01744-0.

Item Response Theory True Score Equating for the Bifactor Model Under the Common-Item Nonequivalent Groups Design.共同项目非等组设计下双因素模型的项目反应理论真分数等值

Appl Psychol Meas. 2022 Sep;46(6):479-493. doi: 10.1177/01466216221108995. Epub 2022 Jun 17.

Testing item response theory invariance of the standardized Quality-of-life Disease Impact Scale (QDIS(®)) in acute coronary syndrome patients: differential functioning of items and test.急性冠状动脉综合征患者中标准化生活质量疾病影响量表（QDIS(®)）的项目反应理论不变性测试：项目和测试的差异功能

Qual Life Res. 2015 Aug;24(8):1809-22. doi: 10.1007/s11136-015-0916-8. Epub 2015 Jan 20.

Efficiency Analysis of Item Response Theory Kernel Equating for Mixed-Format Tests.混合格式测验的项目反应理论核等值效率分析

Appl Psychol Meas. 2023 Nov;47(7-8):496-512. doi: 10.1177/01466216231209757. Epub 2023 Oct 19.

A primer on standardized testing: .标准化测试入门：

J Chiropr Educ. 2019 Oct;33(2):151-163. doi: 10.7899/JCE-18-22. Epub 2019 Jun 6.

A Note on the -Scoring Method Adapted for Polytomous Test Items.关于适用于多值测试项目的-评分方法的说明。

Educ Psychol Meas. 2019 Jun;79(3):545-557. doi: 10.1177/0013164418786014. Epub 2018 Jul 4.

引用本文的文献

Latent -Scoring Modeling: Estimation of Item and Person Parameters.潜在评分建模：项目参数和人员参数的估计

Educ Psychol Meas. 2021 Apr;81(2):388-404. doi: 10.1177/0013164420941147. Epub 2020 Jul 13.

The Response Vector for Mastery Method of Standard Setting.标准设定掌握法的响应向量。

Educ Psychol Meas. 2022 Aug;82(4):719-746. doi: 10.1177/00131644211032388. Epub 2021 Jul 21.

On the Choice of the Item Response Model for Scaling PISA Data: Model Selection Based on Information Criteria and Quantifying Model Uncertainty.关于国际学生评估项目（PISA）数据量表编制中项目反应模型的选择：基于信息准则的模型选择与模型不确定性量化

Entropy (Basel). 2022 May 27;24(6):760. doi: 10.3390/e24060760.

Testing for Differential Item Functioning Under the -Scoring Method.在 - 计分方法下进行差异项目功能测试。（你这里原文“-Scoring Method”前面似乎少了内容，我按正常理解翻译了，你可根据实际补充完整内容后调整译文）

Educ Psychol Meas. 2022 Feb;82(1):107-121. doi: 10.1177/00131644211001524. Epub 2021 Mar 26.

Reading Comprehension Tests for Children: Test Equating and Specific Age-Interval Reports.儿童阅读理解测试：测试等值与特定年龄区间报告。

Front Psychol. 2021 Sep 10;12:662192. doi: 10.3389/fpsyg.2021.662192. eCollection 2021.

The Delta-Scoring Method of Tests With Binary Items: A Note on True Score Estimation and Equating.具有二元项目的测试的德尔塔评分方法：关于真分数估计和等值的一则注释

Educ Psychol Meas. 2018 Oct;78(5):805-825. doi: 10.1177/0013164417724187. Epub 2017 Aug 4.

Modeling of Item Response Functions Under the -Scoring Method.-评分法下项目反应函数的建模

Educ Psychol Meas. 2020 Feb;80(1):126-144. doi: 10.1177/0013164419854176. Epub 2019 Jun 10.

An Application of Reliability Estimation in Longitudinal Designs Through Modeling Item-Specific Error Variance.通过对特定项目误差方差建模在纵向设计中进行可靠性估计的应用

Educ Psychol Meas. 2019 Dec;79(6):1038-1063. doi: 10.1177/0013164419843162. Epub 2019 Apr 22.

Educ Psychol Meas. 2019 Oct;79(5):988-1008. doi: 10.1177/0013164419841428. Epub 2019 Apr 22.

A Note on the -Scoring Method Adapted for Polytomous Test Items.关于适用于多值测试项目的-评分方法的说明。

Educ Psychol Meas. 2019 Jun;79(3):545-557. doi: 10.1177/0013164418786014. Epub 2018 Jul 4.

本文引用的文献

Comparing Simple Scoring With IRT Scoring of Personality Measures: The Navy Computer Adaptive Personality Scales.比较人格测量的简单计分法与IRT计分法：海军计算机自适应人格量表

Appl Psychol Meas. 2015 Mar;39(2):144-154. doi: 10.1177/0146621614559517. Epub 2014 Dec 24.

On the Relationship Between Classical Test Theory and Item Response Theory: From One to the Other and Back.论经典测验理论与项目反应理论之间的关系：从一者到另一者再回归

Educ Psychol Meas. 2016 Apr;76(2):325-338. doi: 10.1177/0013164415576958. Epub 2015 Apr 1.

Relationships Among Classical Test Theory and Item Response Theory Frameworks via Factor Analytic Models.通过因子分析模型探讨经典测试理论与项目反应理论框架之间的关系。

Educ Psychol Meas. 2015 Jun;75(3):389-405. doi: 10.1177/0013164414559071. Epub 2014 Nov 20.

Evaluating the equal-interval hypothesis with test score scales.使用测试分数量表评估等距假设。

Psychometrika. 2014 Jan;79(1):1-19. doi: 10.1007/s11336-013-9342-4. Epub 2013 Jun 7.

Reliability and true-score measures of binary items as a function of their Rasch difficulty parameter.二分项目的信度和真分数测量与其拉施难度参数的函数关系。

J Appl Meas. 2003;4(3):222-33.

The Rasch model, additive conjoint measurement, and new models of probabilistic measurement theory.拉施模型、加法联合测量与概率测量理论的新模型。

J Appl Meas. 2001;2(4):389-423.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验