Department of Mathematics and Statistics, Colby College, Waterville, Maine, USA.
Department of Health Care Policy, Harvard Medical School, Boston, Massachusetts, USA.
Health Serv Res. 2021 Oct;56(5):932-941. doi: 10.1111/1475-6773.13666. Epub 2021 May 12.
To define confounding bias in difference-in-difference studies and compare regression- and matching-based estimators designed to correct bias due to observed confounders.
We simulated data from linear models that incorporated different confounding relationships: time-invariant covariates with a time-varying effect on the outcome, time-varying covariates with a constant effect on the outcome, and time-varying covariates with a time-varying effect on the outcome. We considered a simple setting that is common in the applied literature: treatment is introduced at a single time point and there is no unobserved treatment effect heterogeneity.
We compared the bias and root mean squared error of treatment effect estimates from six model specifications, including simple linear regression models and matching techniques.
Simulation code is provided for replication.
Confounders in difference-in-differences are covariates that change differently over time in the treated and comparison group or have a time-varying effect on the outcome. When such a confounding variable is measured, appropriately adjusting for this confounder (ie, including the confounder in a regression model that is consistent with the causal model) can provide unbiased estimates with optimal SE. However, when a time-varying confounder is affected by treatment, recovering an unbiased causal effect using difference-in-differences is difficult.
Confounding in difference-in-differences is more complicated than in cross-sectional settings, from which techniques and intuition to address observed confounding cannot be imported wholesale. Instead, analysts should begin by postulating a causal model that relates covariates, both time-varying and those with time-varying effects on the outcome, to treatment. This causal model will then guide the specification of an appropriate analytical model (eg, using regression or matching) that can produce unbiased treatment effect estimates. We emphasize the importance of thoughtful incorporation of covariates to address confounding bias in difference-in-difference studies.
定义差分法中的混杂偏差,并比较用于纠正因观察到的混杂因素而产生偏差的回归和匹配估计量。
我们从包含不同混杂关系的线性模型中模拟数据:对结果有随时间变化影响的时不变协变量、对结果有常数影响的时变协变量,以及对结果有随时间变化影响的时变协变量。我们考虑了一个在应用文献中常见的简单设置:治疗仅在一个时间点引入,且不存在未观察到的治疗效果异质性。
我们比较了六种模型规范的处理效果估计值的偏差和均方根误差,包括简单线性回归模型和匹配技术。
提供了模拟代码以供复制。
差异中的混杂因素是指在处理组和对照组中随时间变化不同的协变量,或者对结果有随时间变化的影响的协变量。当存在这种混杂变量时,适当地调整该混杂因素(即,将混杂因素包含在与因果模型一致的回归模型中)可以提供具有最优 SE 的无偏估计值。然而,当一个时变混杂因素受到治疗的影响时,使用差异法恢复无偏的因果效应是困难的。
差异中的混杂比横断面设置中的混杂更为复杂,无法全盘照搬解决观察到的混杂的技术和直觉。相反,分析师应该首先提出一个因果模型,该模型将协变量(包括时变协变量和对结果有随时间变化影响的协变量)与治疗联系起来。然后,这个因果模型将指导适当分析模型的规范(例如,使用回归或匹配),以产生无偏的处理效果估计值。我们强调了在差异研究中,通过深思熟虑地纳入协变量来解决混杂偏差的重要性。