Stuart Elizabeth A, Huskamp Haiden A, Duckworth Kenneth, Simmons Jeffrey, Song Zirui, Chernew Michael, Barry Colleen L
Johns Hopkins Bloomberg School of Public Health.
Harvard Medical School.
Health Serv Outcomes Res Methodol. 2014 Dec 1;14(4):166-182. doi: 10.1007/s10742-014-0123-z.
Difference-in-difference (DD) methods are a common strategy for evaluating the effects of policies or programs that are instituted at a particular point in time, such as the implementation of a new law. The DD method compares changes over time in a group unaffected by the policy intervention to the changes over time in a group affected by the policy intervention, and attributes the "difference-in-differences" to the effect of the policy. DD methods provide unbiased effect estimates if the trend over time would have been the same between the intervention and comparison groups in the absence of the intervention. However, a concern with DD models is that the program and intervention groups may differ in ways that would affect their trends over time, or their compositions may change over time. Propensity score methods are commonly used to handle this type of confounding in other non-experimental studies, but the particular considerations when using them in the context of a DD model have not been well investigated. In this paper, we describe the use of propensity scores in conjunction with DD models, in particular investigating a propensity score weighting strategy that weights the four groups (defined by time and intervention status) to be balanced on a set of characteristics. We discuss the conceptual issues associated with this approach, including the need for caution when selecting variables to include in the propensity score model, particularly given the multiple time point nature of the analysis. We illustrate the ideas and method with an application estimating the effects of a new payment and delivery system innovation (an accountable care organization model called the "Alternative Quality Contract" (AQC) implemented by Blue Cross Blue Shield of Massachusetts) on health plan enrollee out-of-pocket mental health service expenditures. We find no evidence that the AQC affected out-of-pocket mental health service expenditures of enrollees.
双重差分(DD)方法是评估在特定时间实施的政策或项目效果的常用策略,例如新法律的实施。DD方法将未受政策干预的组随时间的变化与受政策干预的组随时间的变化进行比较,并将“双重差分”归因于政策的效果。如果在没有干预的情况下,干预组和对照组随时间的趋势本来会相同,那么DD方法就能提供无偏的效果估计。然而,对DD模型的一个担忧是,项目组和干预组可能在会影响其随时间趋势的方面存在差异,或者它们的构成可能随时间变化。倾向得分方法通常用于处理其他非实验研究中的这类混杂问题,但在DD模型的背景下使用时的具体考虑因素尚未得到充分研究。在本文中,我们描述了倾向得分与DD模型的结合使用,特别是研究一种倾向得分加权策略,该策略对由时间和干预状态定义的四组进行加权,使其在一组特征上达到平衡。我们讨论了与这种方法相关的概念问题,包括在选择纳入倾向得分模型的变量时需要谨慎,特别是考虑到分析的多个时间点性质。我们通过一个应用案例来说明这些想法和方法,该案例估计了一种新的支付和交付系统创新(马萨诸塞州蓝十字蓝盾实施的一种名为“替代质量合同”(AQC)的 accountable care organization 模式)对健康计划参保人的自付心理健康服务支出的影响。我们没有发现证据表明AQC影响了参保人的自付心理健康服务支出。