Centre for MEGA Epidemiology, School of Population Health, University of Melbourne, Melbourne, Australia.
Stat Med. 2012 Jul 10;31(15):1617-32. doi: 10.1002/sim.4504. Epub 2012 Feb 24.
Propensity score methods are increasingly used to estimate the effect of a treatment or exposure on an outcome in non-randomised studies. We focus on one such method, stratification on the propensity score, comparing it with the method of inverse-probability weighting by the propensity score. The propensity score--the conditional probability of receiving the treatment given observed covariates--is usually an unknown probability estimated from the data. Estimators for the variance of treatment effect estimates typically used in practice, however, do not take into account that the propensity score itself has been estimated from the data. By deriving the asymptotic marginal variance of the stratified estimate of treatment effect, correctly taking into account the estimation of the propensity score, we show that routinely used variance estimators are likely to produce confidence intervals that are too conservative when the propensity score model includes variables that predict (cause) the outcome, but only weakly predict the treatment. In contrast, a comparison with the analogous marginal variance for the inverse probability weighted (IPW) estimator shows that routinely used variance estimators for the IPW estimator are likely to produce confidence intervals that are almost always too conservative. Because exact calculation of the asymptotic marginal variance is likely to be complex, particularly for the stratified estimator, we suggest that bootstrap estimates of variance should be used in practice.
倾向评分方法越来越多地用于在非随机研究中估计治疗或暴露对结果的影响。我们专注于一种这样的方法,即倾向评分分层,将其与倾向评分逆概率加权方法进行比较。倾向评分——给定观察到的协变量时接受治疗的条件概率——通常是根据数据估计的未知概率。然而,在实践中常用的治疗效果估计量的方差估计器并没有考虑到倾向评分本身是从数据中估计出来的。通过推导出分层治疗效果估计量的渐近边际方差,正确考虑了倾向评分的估计,我们表明,当倾向评分模型包含预测(导致)结果但仅弱预测治疗的变量时,常规使用的方差估计器很可能产生过于保守的置信区间。相比之下,与逆概率加权(IPW)估计量的类似边际方差进行比较表明,常规使用的 IPW 估计量的方差估计器很可能产生几乎总是过于保守的置信区间。由于精确计算渐近边际方差可能很复杂,特别是对于分层估计量,因此我们建议在实践中使用自举估计量的方差。