Casabianca Jodi M, Lockwood J R, McCaffrey Daniel F
The University of Texas at Austin, Austin, TX, USA.
Educational Testing Service, Princeton, NJ, USA.
Educ Psychol Meas. 2015 Apr;75(2):311-337. doi: 10.1177/0013164414539163. Epub 2014 Jun 22.
Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from 458 middle school teachers over a 2-year period to study changes over time in (a) the average quality of teaching for the population of teachers, (b) the average severity of the population of raters, and (c) the severity of individual raters. To obtain these estimates and assess them in the context of other factors that contribute to the variability in scores, we develop an augmented G study model that is broadly applicable for modeling sources of variability in classroom observation ratings data collected over time. In our data, we found that trends in teaching quality were small. Rater drift was very large during raters' initial days of observation and persisted throughout nearly 2 years of scoring. Raters did not converge to a common level of severity; using our model we estimate that variability among raters actually increases over the course of the study. Variance decompositions based on the model find that trends are a modest source of variance relative to overall rater effects, rater errors on specific lessons, and residual error. The discussion provides possible explanations for trends and rater divergence as well as implications for designs collecting ratings over time.
随着时间的推移收集的课堂教学及互动的观察结果和评分,容易受到教学质量和评分者行为趋势的影响。这些趋势对教学推断和研究设计具有潜在影响。我们使用458名中学教师在两年时间内的课堂评估评分系统-中学版(CLASS-S)协议得分,来研究以下方面随时间的变化:(a)教师群体的平均教学质量,(b)评分者群体的平均严格程度,以及(c)个体评分者的严格程度。为了获得这些估计值并在导致分数变异性的其他因素背景下对其进行评估,我们开发了一个扩展的G研究模型,该模型广泛适用于对随时间收集的课堂观察评分数据中的变异性来源进行建模。在我们的数据中,我们发现教学质量的趋势很小。评分者在观察初期的评分漂移非常大,并且在近2年的评分过程中一直持续。评分者没有收敛到一个共同的严格程度水平;使用我们的模型,我们估计评分者之间的变异性在研究过程中实际上会增加。基于该模型的方差分解发现,相对于整体评分者效应、特定课程的评分者误差和残差误差,趋势是一个适度的方差来源。讨论部分提供了对趋势和评分者差异的可能解释,以及对随时间收集评分的设计的影响。