• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种用于估计大规模绩效评估中评分者偏差和可靠性的贝叶斯层次潜在特质模型。

A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment.

机构信息

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.

出版信息

PLoS One. 2018 Apr 3;13(4):e0195297. doi: 10.1371/journal.pone.0195297. eCollection 2018.

DOI:10.1371/journal.pone.0195297
PMID:29614129
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5882162/
Abstract

We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias.

摘要

我们提出了一种新的方法来对基于评分的评估中的评分者效应进行建模。该方法基于贝叶斯层次模型和后验分布的模拟。我们将其应用于 5 年期间的大规模论文评估数据。实证结果表明,该模型不仅对总分拟合良好,而且对个别评分标准也拟合良好。我们估计评分者效应对最终成绩的中位数影响在 50 分制上为±2 分,而 10%的论文的得分与实际质量至少相差±5 分。大部分影响归因于评分者的不可靠性,而不是评分者的偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/12c634743ef6/pone.0195297.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/f6370eac9bb0/pone.0195297.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/ee5f8c15d0f5/pone.0195297.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/9ce3e8fadb61/pone.0195297.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/1f0f11647aa2/pone.0195297.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/c5fff6b08e08/pone.0195297.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/12c634743ef6/pone.0195297.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/f6370eac9bb0/pone.0195297.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/ee5f8c15d0f5/pone.0195297.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/9ce3e8fadb61/pone.0195297.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/1f0f11647aa2/pone.0195297.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/c5fff6b08e08/pone.0195297.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/12c634743ef6/pone.0195297.g006.jpg

相似文献

1
A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment.一种用于估计大规模绩效评估中评分者偏差和可靠性的贝叶斯层次潜在特质模型。
PLoS One. 2018 Apr 3;13(4):e0195297. doi: 10.1371/journal.pone.0195297. eCollection 2018.
2
Inter-rater reliability of pressure ulcer staging: ordinal probit Bayesian hierarchical model that allows for uncertain rater response.压疮分期的评分者间信度:允许评分者反应存在不确定性的有序概率贝叶斯层次模型。
Stat Med. 2007 Nov 10;26(25):4602-18. doi: 10.1002/sim.2877.
3
A Hierarchical Rater Model for Longitudinal Data.层次评分者模型在纵向数据中的应用。
Multivariate Behav Res. 2017 Sep-Oct;52(5):576-592. doi: 10.1080/00273171.2017.1342202. Epub 2017 Aug 28.
4
Examining rating quality in writing assessment: rater agreement, error, and accuracy.审视写作评估中的评分质量:评分者一致性、误差与准确性。
J Appl Meas. 2012;13(4):321-35.
5
Multi-faceted Rasch measurement and bias patterns in EFL writing performance assessment.外语写作能力评估中的多维度Rasch测量与偏差模式
Psychol Rep. 2013 Apr;112(2):469-85. doi: 10.2466/03.11.PR0.112.2.469-485.
6
Assessment of Differential Rater Functioning in Latent Classes with New Mixture Facets Models.使用新的混合方面模型评估潜在类别中的差异评分者功能。
Multivariate Behav Res. 2017 May-Jun;52(3):391-402. doi: 10.1080/00273171.2017.1299615. Epub 2017 Mar 22.
7
Modeling rater diagnostic skills in binary classification processes.对二进制分类过程中的评分者诊断技能进行建模。
Stat Med. 2018 Feb 20;37(4):557-571. doi: 10.1002/sim.7530. Epub 2017 Nov 2.
8
A new item response theory model for rater centrality using a hierarchical rater model approach.一种使用层次评分者模型方法的评分者中心度新的项目反应理论模型。
Behav Res Methods. 2022 Aug;54(4):1854-1868. doi: 10.3758/s13428-021-01699-y. Epub 2021 Nov 1.
9
Rater Model Using Signal Detection Theory for Latent Differential Rater Functioning.基于信号检测理论的潜在评分者功能差异的评分者模型。
Multivariate Behav Res. 2019 Jul-Aug;54(4):492-504. doi: 10.1080/00273171.2018.1522496. Epub 2018 Dec 17.
10
Assessing agreement between multiple raters with missing rating information, applied to breast cancer tumour grading.评估存在缺失评分信息的多个评分者之间的一致性,并将其应用于乳腺癌肿瘤分级。
PLoS One. 2008 Aug 13;3(8):e2925. doi: 10.1371/journal.pone.0002925.