Golinelli Daniela, Ridgeway Greg, Rhoades Harmony, Tucker Joan, Wenzel Suzanne
RAND, 1776 Main Street, Santa Monica, CA 90407, USA,
Health Serv Outcomes Res Methodol. 2012 Jun;12(2-3):104-118. doi: 10.1007/s10742-012-0090-1.
The quality of propensity scores is traditionally measured by assessing how well they make the distributions of covariates in the treatment and control groups match, which we refer to as "good balance". Good balance guarantees less biased estimates of the treatment effect. However, the cost of achieving good balance is that the variance of the estimates increases due to a reduction in effective sample size, either through the introduction of propensity score weights or dropping cases when propensity score matching. In this paper, we investigate whether it is best to optimize the balance or to settle for a less than optimal balance and use double robust estimation to adjust for remaining differences. We compare treatment effect estimates from regression, propensity score weighting, and double robust estimation with varying levels of effort expended to achieve balance using data from a study about the differences in outcomes by HIV status in heterosexually active homeless men residing in Los Angeles. Because of how costly data collection efforts are for this population, it is important to find an alternative estimation method that does not reduce effective sample size as much as methods that aggressively aim to optimize balance. Results from a simulation study suggest that there are instances in which we can obtain more precise treatment effect estimates without increasing bias too much by using a combination of regression and propensity score weights that achieve a less than optimal balance. There is a bias-variance tradeoff at work in propensity score estimation; every step toward better balance usually means an increase in variance and at some point a marginal decrease in bias may not be worth the associated increase in variance.
倾向得分的质量传统上是通过评估它们在多大程度上使治疗组和对照组中协变量的分布相匹配来衡量的,我们将其称为“良好平衡”。良好平衡可确保对治疗效果的估计偏差更小。然而,实现良好平衡的代价是,由于有效样本量的减少,估计值的方差会增加,这是通过引入倾向得分权重或在倾向得分匹配时剔除病例导致的。在本文中,我们研究是最好优化平衡,还是满足于次优平衡并使用双重稳健估计来调整剩余差异。我们使用来自一项关于洛杉矶异性恋活跃无家可归男性中按艾滋病毒感染状况划分的结局差异研究的数据,比较了回归、倾向得分加权和双重稳健估计在为实现平衡而投入不同程度努力情况下的治疗效果估计值。由于针对该人群的数据收集工作成本高昂,因此找到一种替代估计方法很重要,这种方法不会像积极追求优化平衡的方法那样大幅减少有效样本量。模拟研究结果表明,在某些情况下,通过使用实现次优平衡的回归和倾向得分权重的组合,我们可以在不显著增加偏差的情况下获得更精确的治疗效果估计值。在倾向得分估计中存在偏差 - 方差权衡;朝着更好平衡迈出的每一步通常意味着方差增加,而且在某些时候,偏差的边际减少可能不值得方差的相应增加。