Department of Statistical Modelling, Institute of Computer Science of the Czech Academy of Sciences, Prague, Czech Republic.
Institute for Research and Development of Education, Faculty of Education, Charles University, Prague, Czech Republic.
PLoS One. 2018 Oct 5;13(10):e0203002. doi: 10.1371/journal.pone.0203002. eCollection 2018.
Ratings are present in many areas of assessment including peer review of research proposals and journal articles, teacher observations, university admissions and selection of new hires. One feature present in any rating process with multiple raters is that different raters often assign different scores to the same assessee, with the potential for bias and inconsistencies related to rater or assessee covariates. This paper analyzes disparities in ratings of internal and external applicants to teaching positions using applicant data from Spokane Public Schools. We first test for biases in rating while accounting for measures of teacher applicant qualifications and quality. Then, we develop model-based inter-rater reliability (IRR) estimates that allow us to account for various sources of measurement error, the hierarchical structure of the data, and to test whether covariates, such as applicant status, moderate IRR. We find that applicants external to the district receive lower ratings for job applications compared to internal applicants. This gap in ratings remains significant even after including measures of qualifications and quality such as experience, state licensure scores, or estimated teacher value added. With model-based IRR, we further show that consistency between raters is significantly lower when rating external applicants. We conclude the paper by discussing policy implications and possible applications of our model-based IRR estimate for hiring and selection practices in and out of the teacher labor market.
评分在许多评估领域都存在,包括研究提案和期刊文章的同行评审、教师观察、大学招生和新员工选拔。在任何具有多个评分者的评分过程中,一个共同的特点是,不同的评分者通常会给同一个被评估者分配不同的分数,这可能与评分者或被评估者的协变量有关。本文利用斯波坎公立学校的教师申请人数据,分析了对教学职位的内部和外部申请人的评分差异。我们首先在考虑教师申请人资格和质量衡量标准的情况下,检验评分中的偏差。然后,我们开发基于模型的评分者间信度(IRR)估计值,使我们能够考虑各种测量误差源、数据的层次结构,并检验诸如申请人身份等协变量是否会调节 IRR。我们发现,与内部申请人相比,区外的申请人在工作申请中的评分较低。即使包括经验、州执照分数或估计的教师增值等资格和质量衡量标准,这种评分差距仍然显著。通过基于模型的 IRR,我们进一步表明,当对外部申请人进行评分时,评分者之间的一致性明显较低。最后,我们讨论了我们的基于模型的 IRR 估计值在教师劳动力市场内外的招聘和选拔实践中的政策含义和可能应用。