Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada.
Stat Med. 2013 Jul 20;32(16):2837-49. doi: 10.1002/sim.5705. Epub 2012 Dec 12.
Propensity score methods are increasingly being used to reduce or minimize the effects of confounding when estimating the effects of treatments, exposures, or interventions when using observational or non-randomized data. Under the assumption of no unmeasured confounders, previous research has shown that propensity score methods allow for unbiased estimation of linear treatment effects (e.g., differences in means or proportions). However, in biomedical research, time-to-event outcomes occur frequently. There is a paucity of research into the performance of different propensity score methods for estimating the effect of treatment on time-to-event outcomes. Furthermore, propensity score methods allow for the estimation of marginal or population-average treatment effects. We conducted an extensive series of Monte Carlo simulations to examine the performance of propensity score matching (1:1 greedy nearest-neighbor matching within propensity score calipers), stratification on the propensity score, inverse probability of treatment weighting (IPTW) using the propensity score, and covariate adjustment using the propensity score to estimate marginal hazard ratios. We found that both propensity score matching and IPTW using the propensity score allow for the estimation of marginal hazard ratios with minimal bias. Of these two approaches, IPTW using the propensity score resulted in estimates with lower mean squared error when estimating the effect of treatment in the treated. Stratification on the propensity score and covariate adjustment using the propensity score result in biased estimation of both marginal and conditional hazard ratios. Applied researchers are encouraged to use propensity score matching and IPTW using the propensity score when estimating the relative effect of treatment on time-to-event outcomes.
倾向评分方法越来越多地被用于在使用观察性或非随机数据估计治疗、暴露或干预的效果时,减少或最小化混杂的影响。在没有未测量混杂因素的假设下,先前的研究表明,倾向评分方法允许对线性治疗效果(例如,均值或比例差异)进行无偏估计。然而,在生物医学研究中,事件发生时间的结果经常出现。关于不同倾向评分方法估计治疗对事件发生时间的效果的性能的研究很少。此外,倾向评分方法允许估计边际或总体平均治疗效果。我们进行了一系列广泛的蒙特卡罗模拟,以检查倾向评分匹配(在倾向评分卡尺内进行 1:1 贪婪最近邻居匹配)、倾向评分分层、基于倾向评分的治疗逆概率加权(Inverse Probability of Treatment Weighting,IPTW)以及使用倾向评分进行协变量调整,以估计边际危险比的性能。我们发现,倾向评分匹配和基于倾向评分的 IPTW 都允许以最小的偏差估计边际危险比。在这两种方法中,在治疗组中估计治疗效果时,基于倾向评分的 IPTW 产生的估计值具有更低的均方误差。倾向评分分层和使用倾向评分进行协变量调整会导致边际和条件危险比的有偏估计。鼓励应用研究人员在估计治疗对事件发生时间的相对效果时,使用倾向评分匹配和基于倾向评分的 IPTW。