Thomas Jaime, Cook Thomas D, Klein Alice, Starkey Prentice, DeFlorio Lydia
1 Mathematica Policy Research, Oakland, CA, USA.
2 Northwestern University, Evanston, IL, USA.
Eval Rev. 2018 Jun;42(3):318-357. doi: 10.1177/0193841X18786818. Epub 2018 Aug 6.
Policy makers face dilemmas when choosing a policy, program, or practice to implement. Researchers in education, public health, and other fields have proposed a sequential approach to identifying interventions worthy of broader adoption, involving pilot, efficacy, effectiveness, and scale-up studies. In this article, we examine a scale-up of an early math intervention to the state level, using a cluster randomized controlled trial. The intervention, Pre-K Mathematics, has produced robust positive effects on children's math ability in prior pilot, efficacy, and effectiveness studies. In the current study, we ask if it remains effective at a larger scale in a heterogeneous collection of pre-K programs that plausibly represent all low-income families with a child of pre-K age who live in California. We find that Pre-K Mathematics remains effective at the state level, with positive and statistically significant effects (effect size on the Early Childhood Longitudinal Study, Birth Cohort Mathematics Assessment = .30, p < .01). In addition, we develop a framework of the dimensions of scale-up to explain why effect sizes might decrease as scale increases. Using this framework, we compare the causal estimates from the present study to those from earlier, smaller studies. Consistent with our framework, we find that effect sizes have decreased over time. We conclude with a discussion of the implications of our study for how we think about the external validity of causal relationships.
政策制定者在选择要实施的政策、项目或实践时面临两难困境。教育、公共卫生及其他领域的研究人员提出了一种循序渐进的方法来确定值得更广泛采用的干预措施,包括试点、效果、效能和扩大规模研究。在本文中,我们使用整群随机对照试验,考察了一项早期数学干预措施扩大到州一级的情况。这种干预措施“学前数学”在之前的试点、效果和效能研究中已对儿童的数学能力产生了显著的积极影响。在当前的研究中,我们探讨在一个异质性的学前项目集合中大规模实施该措施时它是否仍然有效,这些项目合理地代表了加利福尼亚州所有有学前年龄孩子的低收入家庭。我们发现“学前数学”在州一级仍然有效,具有积极且在统计学上显著的效果(对《儿童纵向研究,出生队列数学评估》的效应量 =.30,p <.01)。此外,我们构建了一个扩大规模维度的框架,以解释为什么效应量可能会随着规模的增加而减小。使用这个框架,我们将本研究的因果估计与早期较小规模研究的估计进行了比较。与我们的框架一致,我们发现效应量随着时间推移有所下降。我们最后讨论了本研究对于我们如何思考因果关系外部有效性的意义。