Ryan Andrew M, Burgess James F, Dimick Justin B
University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI.
Veterans Affairs Boston Health Care System, US Department of Veteran Affairs, Boston University School of Public Health, Boston, MA.
Health Serv Res. 2015 Aug;50(4):1211-35. doi: 10.1111/1475-6773.12270. Epub 2014 Dec 11.
To evaluate the effects of specification choices on the accuracy of estimates in difference-in-differences (DID) models.
Process-of-care quality data from Hospital Compare between 2003 and 2009.
We performed a Monte Carlo simulation experiment to estimate the effect of an imaginary policy on quality. The experiment was performed for three different scenarios in which the probability of treatment was (1) unrelated to pre-intervention performance; (2) positively correlated with pre-intervention levels of performance; and (3) positively correlated with pre-intervention trends in performance. We estimated alternative DID models that varied with respect to the choice of data intervals, the comparison group, and the method of obtaining inference. We assessed estimator bias as the mean absolute deviation between estimated program effects and their true value. We evaluated the accuracy of inferences through statistical power and rates of false rejection of the null hypothesis.
Performance of alternative specifications varied dramatically when the probability of treatment was correlated with pre-intervention levels or trends. In these cases, propensity score matching resulted in much more accurate point estimates. The use of permutation tests resulted in lower false rejection rates for the highly biased estimators, but the use of clustered standard errors resulted in slightly lower false rejection rates for the matching estimators.
When treatment and comparison groups differed on pre-intervention levels or trends, our results supported specifications for DID models that include matching for more accurate point estimates and models using clustered standard errors or permutation tests for better inference. Based on our findings, we propose a checklist for DID analysis.
评估规格选择对双重差分(DID)模型估计准确性的影响。
2003年至2009年医院比较中的医疗护理质量数据。
我们进行了一项蒙特卡洛模拟实验,以估计一项虚构政策对质量的影响。该实验针对三种不同情景进行,其中治疗概率分别为:(1)与干预前表现无关;(2)与干预前表现水平呈正相关;(3)与干预前表现趋势呈正相关。我们估计了替代DID模型,这些模型在数据区间选择、对照组以及获得推断的方法方面存在差异。我们将估计偏差评估为估计项目效果与其真实值之间的平均绝对偏差。我们通过统计功效和零假设的错误拒绝率来评估推断的准确性。
当治疗概率与干预前水平或趋势相关时,替代规格的表现差异很大。在这些情况下,倾向得分匹配产生的点估计要准确得多。对于偏差较大的估计器,使用排列检验会导致较低的错误拒绝率,但对于匹配估计器,使用聚类标准误会导致略低的错误拒绝率。
当治疗组和对照组在干预前水平或趋势上存在差异时,我们的结果支持DID模型的规格,包括采用匹配以获得更准确的点估计,以及使用聚类标准误或排列检验的模型以进行更好的推断。基于我们的发现,我们提出了一份DID分析清单。