Suppr超能文献

一种用于估计大规模绩效评估中评分者偏差和可靠性的贝叶斯层次潜在特质模型。

A Bayesian hierarchical latent trait model for estimating rater bias and reliability in large-scale performance assessment.

机构信息

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.

出版信息

PLoS One. 2018 Apr 3;13(4):e0195297. doi: 10.1371/journal.pone.0195297. eCollection 2018.

Abstract

We propose a novel approach to modelling rater effects in scoring-based assessment. The approach is based on a Bayesian hierarchical model and simulations from the posterior distribution. We apply it to large-scale essay assessment data over a period of 5 years. Empirical results suggest that the model provides a good fit for both the total scores and when applied to individual rubrics. We estimate the median impact of rater effects on the final grade to be ± 2 points on a 50 point scale, while 10% of essays would receive a score at least ± 5 different from their actual quality. Most of the impact is due to rater unreliability, not rater bias.

摘要

我们提出了一种新的方法来对基于评分的评估中的评分者效应进行建模。该方法基于贝叶斯层次模型和后验分布的模拟。我们将其应用于 5 年期间的大规模论文评估数据。实证结果表明,该模型不仅对总分拟合良好,而且对个别评分标准也拟合良好。我们估计评分者效应对最终成绩的中位数影响在 50 分制上为±2 分,而 10%的论文的得分与实际质量至少相差±5 分。大部分影响归因于评分者的不可靠性,而不是评分者的偏差。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ef23/5882162/f6370eac9bb0/pone.0195297.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验