Kim Grace Young-Suk, Schatschneider Christopher, Wanzek Jeanne, Gatlin Brandy, Al Otaiba Stephanie
University of California, Irvine, 3500 Education Building, Irvine, CA 92697, USA.
Read Writ. 2017 Jun;30(6):1287-1310. doi: 10.1007/s11145-017-9724-6. Epub 2017 Feb 6.
We examined how raters and tasks influence measurement error in writing evaluation and how many raters and tasks are needed to reach a desirable level of .90 and .80 reliabilities for children in Grades 3 and 4. A total of 211 children (102 boys) were administered three tasks in narrative and expository genres, respectively, and their written compositions were evaluated in widely used evaluation methods for developing writers: holistic scoring, productivity, and curriculum-based writing scores. Results showed that 54% and 52% of variance in narrative and expository compositions were attributable to true individual differences in writing. Students' scores varied largely by tasks (30.44% and 28.61% of variance), but not by raters. To reach the reliability of .90, multiple tasks and raters were needed, and for the reliability of .80, a single rater and multiple tasks were needed. These findings offer important implications about reliably evaluating children's writing skills, given that writing is typically evaluated by a single task and a single rater in classrooms and even in state accountability systems.
我们研究了评分者和任务如何影响写作评估中的测量误差,以及需要多少评分者和任务才能使三年级和四年级学生的信度达到理想的0.90和0.80水平。共有211名儿童(102名男孩)分别完成了三篇记叙文和说明文写作任务,他们的作文采用了广泛应用于发展中写作者的评估方法进行评估:整体评分、写作量和基于课程的写作分数。结果表明,记叙文和说明文作文中分别有54%和52%的方差可归因于写作中真实的个体差异。学生的分数因任务而异(方差分别为30.44%和28.61%),但不因评分者而异。要达到0.90的信度,需要多个任务和评分者;要达到0.80的信度,需要单个评分者和多个任务。鉴于在课堂甚至州问责制系统中,写作通常由单个任务和单个评分者进行评估,这些发现为可靠评估儿童写作技能提供了重要启示。