Center for Promoting Research to Practice, Lehigh University, United States of America.
Center for Promoting Research to Practice, Lehigh University, United States of America.
J Sch Psychol. 2024 Aug;105:101319. doi: 10.1016/j.jsp.2024.101319. Epub 2024 May 14.
Computer adaptive tests have become popular assessments to screen students for academic risk. Research is emerging regarding their use as progress monitoring tools to measure response to instruction. We evaluated the accuracy of the trend-line decision rule when applied to outcomes from a frequently used reading computer adaptive test (i.e., Star Reading [SR]) and frequently used math computer adaptive test (i.e., Star Math [SM]). Analyses of extant SR and SM data were conducted to inform conditions for simulations to determine the number of assessments required to yield sufficient sensitivity (i.e., probability of recommending an instructional change when a change was warranted) and specificity (i.e., probability of recommending maintaining an intervention when a change was not warranted) when comparing performance to goal lines based upon a future target score (i.e., benchmark) as well as normative comparisons (50th and 75th percentiles). The extant dataset of SR outcomes consisted of monthly progress monitoring data from 993 Grade 3, 804 Grade 4, and 709 Grade 5 students from multiple states in the United States northwest. Data for SM were also drawn from the northwest and contained outcomes from 518 Grade 3, 474 Grade 4, and 391 Grade 5 students. Grade level samples were predominately White (range = 59.89%-67.72%) followed by Latinx (range = 9.65%-15.94%). Results of simulations suggest that when data were collected once a month, seven, eight, and nine observations were required to support low-stakes decisions with SR for Grades 3, 4, and 5, respectively. For SM, nine, ten, and eight observations were required for Grades, 3, 4, and 5, respectively. Given the length of time required to support reasonably accurate decisions, recommendations to consider other types of assessments and decision-making frameworks for academic progress monitoring are provided.
计算机自适应测验已成为筛选学生学业风险的流行评估方式。关于将其用作衡量教学反应的进展监测工具的研究正在出现。我们评估了趋势线决策规则在经常使用的阅读计算机自适应测验(即 Star Reading [SR])和经常使用的数学计算机自适应测验(即 Star Math [SM])的结果中应用的准确性。对现有的 SR 和 SM 数据进行分析,为模拟提供信息,以确定在基于未来目标分数(即基准)比较表现与目标线(以及规范比较,即第 50 个和第 75 个百分位数)时,需要进行多少次评估以获得足够的灵敏度(即当需要改变时推荐改变教学的概率)和特异性(即当不需要改变时推荐维持干预的概率)。现有的 SR 结果数据集由来自美国西北部多个州的 993 名 3 年级、804 名 4 年级和 709 名 5 年级学生的每月进展监测数据组成。SM 的数据也来自西北部,包含 518 名 3 年级、474 名 4 年级和 391 名 5 年级学生的成绩。年级样本主要为白人(范围为 59.89%-67.72%),其次是拉丁裔(范围为 9.65%-15.94%)。模拟结果表明,当每月收集一次数据时,分别需要 7、8 和 9 次观察结果来支持 3、4 和 5 年级的低风险决策。对于 SM,3、4 和 5 年级分别需要 9、10 和 8 次观察结果。鉴于支持合理准确决策所需的时间长度,建议考虑其他类型的评估和学术进展监测的决策框架。