在稀疏评分者介导的评估网络中检测评分者偏差

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks.

作者信息

Wind Stefanie A, Ge Yuan

机构信息

The University of Alabama, Tuscaloosa, AL, USA.

出版信息

Educ Psychol Meas. 2021 Oct;81(5):996-1022. doi: 10.1177/0013164420988108. Epub 2021 Jan 19.

DOI:10.1177/0013164420988108

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8377343/

Abstract

Practical constraints in rater-mediated assessments limit the availability of complete data. Instead, most scoring procedures include one or two ratings for each performance, with overlapping performances across raters or linking sets of multiple-choice items to facilitate model estimation. These incomplete scoring designs present challenges for detecting rater biases, or differential rater functioning (DRF). The purpose of this study is to illustrate and explore the sensitivity of DRF indices in realistic sparse rating designs that have been documented in the literature that include different types and levels of connectivity among raters and students. The results indicated that it is possible to detect DRF in sparse rating designs, but the sensitivity of DRF indices varies across designs. We consider the implications of our findings for practice related to monitoring raters in performance assessments.

摘要

评分者介导评估中的实际限制因素限制了完整数据的可用性。相反，大多数评分程序对每个表现给出一两个评分，评分者之间存在重叠表现，或者将多项选择题组相联系以促进模型估计。这些不完整的评分设计给检测评分者偏差或评分者差异功能（DRF）带来了挑战。本研究的目的是说明和探讨DRF指标在现实的稀疏评分设计中的敏感性，这些设计已在文献中有所记载，包括评分者与学生之间不同类型和水平的关联性。结果表明，在稀疏评分设计中有可能检测到DRF，但DRF指标的敏感性因设计而异。我们考虑了研究结果对与绩效评估中监测评分者相关实践的影响。

相似文献

1

Detecting Rater Biases in Sparse Rater-Mediated Assessment Networks.在稀疏评分者介导的评估网络中检测评分者偏差

Educ Psychol Meas. 2021 Oct;81(5):996-1022. doi: 10.1177/0013164420988108. Epub 2021 Jan 19.

2

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments.探索评分者不匹配和评分者差异功能在绩效评估中的综合影响。

Educ Psychol Meas. 2019 Oct;79(5):962-987. doi: 10.1177/0013164419834613. Epub 2019 Apr 2.

3

Detecting Differential Rater Functioning in Severity and Centrality: The Dual DRF Facets Model.检测严重程度和中心性方面的评分者差异功能：双重DRF维度模型

Educ Psychol Meas. 2022 Aug;82(4):757-781. doi: 10.1177/00131644211043207. Epub 2021 Sep 2.

4

Rater Model Using Signal Detection Theory for Latent Differential Rater Functioning.基于信号检测理论的潜在评分者功能差异的评分者模型。

Multivariate Behav Res. 2019 Jul-Aug;54(4):492-504. doi: 10.1080/00273171.2018.1522496. Epub 2018 Dec 17.

5

Exploring Incomplete Rating Designs With Mokken Scale Analysis.运用莫肯量表分析探索不完全评分设计

Educ Psychol Meas. 2018 Apr;78(2):319-342. doi: 10.1177/0013164416675393. Epub 2016 Oct 23.

6

Using Repeated Ratings to Improve Measurement Precision in Incomplete Rating Designs.在不完全评分设计中使用重复评分提高测量精度

J Appl Meas. 2018;19(2):148-161.

7

The Stabilizing Influences of Linking Set Size and Model-Data Fit in Sparse Rater-Mediated Assessment Networks.稀疏评分者介导评估网络中链接集大小与模型-数据拟合的稳定影响

Educ Psychol Meas. 2018 Aug;78(4):679-707. doi: 10.1177/0013164417703733. Epub 2017 Apr 12.

8

Rater characteristics, response content, and scoring contexts: Decomposing the determinates of scoring accuracy.评分者特征、回答内容和评分情境：剖析评分准确性的决定因素。

Front Psychol. 2022 Aug 10;13:937097. doi: 10.3389/fpsyg.2022.937097. eCollection 2022.

9

The Effects of Rating Designs on Rater Classification Accuracy and Rater Measurement Precision in Large-Scale Mixed-Format Assessments.评分设计对大规模混合格式评估中评分者分类准确性和评分者测量精度的影响。

Appl Psychol Meas. 2023 Mar;47(2):91-105. doi: 10.1177/01466216231151705. Epub 2023 Jan 12.

10

Does Sparseness Matter? Examining the Use of Generalizability Theory and Many-Facet Rasch Measurement in Sparse Rating Designs.稀疏性重要吗？审视通用izability理论和多面Rasch测量在稀疏评分设计中的应用。

Appl Psychol Meas. 2023 Sep;47(5-6):351-364. doi: 10.1177/01466216231182148. Epub 2023 Jun 7.

引用本文的文献

1

Understanding Rater Cognition in Performance Assessment: A Mixed IRTree Approach.理解绩效评估中的评分者认知：一种混合IRTree方法。

Appl Psychol Meas. 2025 Apr 14:01466216251333578. doi: 10.1177/01466216251333578.

2

Human ratings take time: A hierarchical facets model for the joint analysis of ratings and rating times.人力评分需要时间：一种联合分析评分和评分时间的层次因素模型。

Behav Res Methods. 2024 Apr;56(4):3535-3547. doi: 10.3758/s13428-023-02259-2. Epub 2023 Nov 2.

3

The Effects of Rating Designs on Rater Classification Accuracy and Rater Measurement Precision in Large-Scale Mixed-Format Assessments.评分设计对大规模混合格式评估中评分者分类准确性和评分者测量精度的影响。

Appl Psychol Meas. 2023 Mar;47(2):91-105. doi: 10.1177/01466216231151705. Epub 2023 Jan 12.

4

Detecting Differential Rater Functioning in Severity and Centrality: The Dual DRF Facets Model.检测严重程度和中心性方面的评分者差异功能：双重DRF维度模型

Educ Psychol Meas. 2022 Aug;82(4):757-781. doi: 10.1177/00131644211043207. Epub 2021 Sep 2.

本文引用的文献

1

Exploring the Combined Effects of Rater Misfit and Differential Rater Functioning in Performance Assessments.探索评分者不匹配和评分者差异功能在绩效评估中的综合影响。

Educ Psychol Meas. 2019 Oct;79(5):962-987. doi: 10.1177/0013164419834613. Epub 2019 Apr 2.

2

Examining the Impacts of Rater Effects in Performance Assessments.审视评分者效应在绩效评估中的影响。

Appl Psychol Meas. 2019 Mar;43(2):159-171. doi: 10.1177/0146621618789391. Epub 2018 Aug 5.

3

Detecting Rater Effects under Rating Designs with Varying Levels of Missingness.在存在不同程度缺失值的评分设计下检测评分者效应。

J Appl Meas. 2018;19(3):243-257.

4

The Stabilizing Influences of Linking Set Size and Model-Data Fit in Sparse Rater-Mediated Assessment Networks.稀疏评分者介导评估网络中链接集大小与模型-数据拟合的稳定影响

Educ Psychol Meas. 2018 Aug;78(4):679-707. doi: 10.1177/0013164417703733. Epub 2017 Apr 12.

5

Improving the Assessment of Differential Item Functioning in Large-Scale Programs With Dual-Scale Purification of Rasch Models: The PISA Example.通过Rasch模型的双尺度纯化改进大规模项目中差异项目功能的评估：以国际学生评估项目（PISA）为例

Appl Psychol Meas. 2018 May;42(3):206-220. doi: 10.1177/0146621617726786. Epub 2017 Aug 29.

6

Monitoring Countries in a Changing World: A New Look at DIF in International Surveys.不断变化的世界中的监测国家：国际调查中差异项目功能（DIF）的新视角

Psychometrika. 2017 Mar;82(1):210-232. doi: 10.1007/s11336-016-9543-8. Epub 2016 Nov 14.

7

Comparison of Models and Indices for Detecting Rater Centrality.用于检测评分者中心性的模型与指标比较

J Appl Meas. 2015;16(3):228-41.

8

A Family of Rater Accuracy Models.一系列评分者准确性模型。

J Appl Meas. 2015;16(2):153-60.

9

Diagnosing a common rater halo effect using the polytomous Rasch model.使用多值Rasch模型诊断常见的评分者光环效应。

J Appl Meas. 2011;12(3):194-211.

10

Using classical and modern measurement theories to explore rater, domain, and gender influences on student writing ability.运用经典和现代测量理论，探究评分者、领域和性别对学生写作能力的影响。

J Appl Meas. 2009;10(3):225-46.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验