Keller-Margulis Milena A, Mercer Sterett H, Thomas Erin L
Department of Psychological, Health, and Learning Sciences.
Department of Educational and Counselling Psychology and Special Education, University of British Columbia.
Sch Psychol Q. 2016 Sep;31(3):383-392. doi: 10.1037/spq0000126. Epub 2015 Aug 31.
The purpose of this study was to examine the reliability of written expression curriculum-based measurement (WE-CBM) in the context of universal screening from a generalizability theory framework. Students in second through fifth grade (n = 145) participated in the study. The sample included 54% female students, 49% White students, 23% African American students, 17% Hispanic students, 8% Asian students, and 3% of students identified as 2 or more races. Of the sample, 8% were English Language Learners and 6% were students receiving special education. Three WE-CBM probes were administered for 7 min each at 3 time points across 1 year. Writing samples were scored for commonly used WE-CBM metrics (e.g., correct minus incorrect word sequences; CIWS). Results suggest that nearly half the variance in WE-CBM is related to unsystematic error and that conventional screening procedures (i.e., the use of one 3-min sample) do not yield scores with adequate reliability for relative or absolute decisions about student performance. In most grades, three 3-min writing samples (or 2 longer duration samples) were required for adequate reliability for relative decisions, and three 7-min writing samples would not yield adequate reliability for relative decisions about within-year student growth. Implications and recommendations are discussed. (PsycINFO Database Record
本研究的目的是从概化理论框架的角度,在普遍筛查的背景下检验基于课程的书面表达测评(WE-CBM)的可靠性。二至五年级的学生(n = 145)参与了该研究。样本中包括54%的女生、49%的白人学生、23%的非裔美国学生、17%的西班牙裔学生、8%的亚裔学生以及3%被认定为属于两个或更多种族的学生。在样本中,8%是英语学习者,6%是接受特殊教育的学生。在1年中的3个时间点,每次对三个WE-CBM测验进行7分钟的施测。对写作样本按照常用的WE-CBM指标(例如,正确减去错误的单词序列;CIWS)进行评分。结果表明,WE-CBM中近一半的方差与非系统误差有关,并且传统的筛查程序(即使用一个3分钟的样本)无法得出关于学生表现的相对或绝对决策的具有足够可靠性的分数。在大多数年级,对于相对决策而言,需要三个3分钟的写作样本(或两个更长时长的样本)才能具有足够的可靠性,而三个7分钟的写作样本对于年内学生成长的相对决策而言无法产生足够的可靠性。文中讨论了相关影响和建议。(PsycINFO数据库记录)