Department of Educational Psychology, University of Wisconsin-Madison.
Department of Statistics, University of Wisconsin-Madison.
Multivariate Behav Res. 2021 Nov-Dec;56(6):829-852. doi: 10.1080/00273171.2020.1808437. Epub 2020 Aug 28.
There is a growing interest in using machine learning (ML) methods for causal inference due to their (nearly) automatic and flexible ability to model key quantities such as the propensity score or the outcome model. Unfortunately, most ML methods for causal inference have been studied under single-level settings where all individuals are independent of each other and there is little work in using these methods with clustered or nested data, a common setting in education studies. This paper investigates using one particular ML method based on random forests known as Causal Forests to estimate treatment effects in multilevel observational data. We conduct simulation studies under different types of multilevel data, including two-level, three-level, and cross-classified data. Our simulation study shows that when the ML method is supplemented with estimated propensity scores from multilevel models that account for clustered/hierarchical structure, the modified ML method outperforms preexisting methods in a wide variety of settings. We conclude by estimating the effect of private math lessons in the Trends in International Mathematics and Science Study data, a large-scale educational assessment where students are nested within schools.
由于机器学习 (ML) 方法具有自动且灵活地建模倾向评分或结果模型等关键量的能力,因此人们对其用于因果推断的兴趣日益浓厚。不幸的是,大多数用于因果推断的 ML 方法都是在单级设置下进行研究的,在这种设置下,所有个体都是相互独立的,而在聚类或嵌套数据(教育研究中常见的设置)中使用这些方法的工作很少。本文研究了使用一种基于随机森林的特定 ML 方法,即因果森林,来估计多层次观测数据中的处理效果。我们在不同类型的多层次数据下进行了模拟研究,包括两水平、三水平和交叉分类数据。我们的模拟研究表明,当 ML 方法辅以考虑聚类/层次结构的多层次模型中估计的倾向评分时,修改后的 ML 方法在各种设置下都优于现有方法。最后,我们在 Trends in International Mathematics and Science Study 数据中估计了私人数学课对学生的影响,这是一项大规模的教育评估,学生嵌套在学校中。