Austin Peter C, Stuart Elizabeth A
1 Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada.
2 Institute of Health Management, Policy and Evaluation, University of Toronto.
Stat Methods Med Res. 2017 Aug;26(4):1654-1670. doi: 10.1177/0962280215584401. Epub 2015 Apr 30.
There is increasing interest in estimating the causal effects of treatments using observational data. Propensity-score matching methods are frequently used to adjust for differences in observed characteristics between treated and control individuals in observational studies. Survival or time-to-event outcomes occur frequently in the medical literature, but the use of propensity score methods in survival analysis has not been thoroughly investigated. This paper compares two approaches for estimating the Average Treatment Effect (ATE) on survival outcomes: Inverse Probability of Treatment Weighting (IPTW) and full matching. The performance of these methods was compared in an extensive set of simulations that varied the extent of confounding and the amount of misspecification of the propensity score model. We found that both IPTW and full matching resulted in estimation of marginal hazard ratios with negligible bias when the ATE was the target estimand and the treatment-selection process was weak to moderate. However, when the treatment-selection process was strong, both methods resulted in biased estimation of the true marginal hazard ratio, even when the propensity score model was correctly specified. When the propensity score model was correctly specified, bias tended to be lower for full matching than for IPTW. The reasons for these biases and for the differences between the two methods appeared to be due to some extreme weights generated for each method. Both methods tended to produce more extreme weights as the magnitude of the effects of covariates on treatment selection increased. Furthermore, more extreme weights were observed for IPTW than for full matching. However, the poorer performance of both methods in the presence of a strong treatment-selection process was mitigated by the use of IPTW with restriction and full matching with a caliper restriction when the propensity score model was correctly specified.
利用观察性数据估计治疗的因果效应正受到越来越多的关注。倾向得分匹配方法经常用于调整观察性研究中治疗组和对照组个体在观察特征上的差异。生存或事件发生时间结局在医学文献中经常出现,但倾向得分方法在生存分析中的应用尚未得到充分研究。本文比较了两种估计生存结局平均治疗效应(ATE)的方法:逆概率治疗加权法(IPTW)和完全匹配法。在一系列广泛的模拟中比较了这些方法的性能,这些模拟改变了混杂程度和倾向得分模型的误设量。我们发现,当ATE是目标估计量且治疗选择过程为弱到中度时,IPTW和完全匹配法都能得到偏差可忽略不计的边际风险比估计值。然而,当治疗选择过程较强时,即使倾向得分模型设定正确,两种方法都会导致对真实边际风险比的有偏估计。当倾向得分模型设定正确时,完全匹配法的偏差往往比IPTW法更低。这些偏差以及两种方法之间差异的原因似乎是由于每种方法产生的一些极端权重。随着协变量对治疗选择影响程度的增加,两种方法都倾向于产生更极端的权重。此外,观察到IPTW法比完全匹配法产生的极端权重更多。然而,当倾向得分模型设定正确时,通过使用受限的IPTW法和带卡尺限制的完全匹配法,两种方法在治疗选择过程较强时表现较差的情况得到了缓解。