Institute for Clinical Evaluative Sciences, Toronto, Ont., Canada.
Stat Med. 2011 May 20;30(11):1292-301. doi: 10.1002/sim.4200. Epub 2011 Feb 21.
Propensity-score matching allows one to reduce the effects of treatment-selection bias or confounding when estimating the effects of treatments when using observational data. Some authors have suggested that methods of inference appropriate for independent samples can be used for assessing the statistical significance of treatment effects when using propensity-score matching. Indeed, many authors in the applied medical literature use methods for independent samples when making inferences about treatment effects using propensity-score matched samples. Dichotomous outcomes are common in healthcare research. In this study, we used Monte Carlo simulations to examine the effect on inferences about risk differences (or absolute risk reductions) when statistical methods for independent samples are used compared with when statistical methods for paired samples are used in propensity-score matched samples. We found that compared with using methods for independent samples, the use of methods for paired samples resulted in: (i) empirical type I error rates that were closer to the advertised rate; (ii) empirical coverage rates of 95 per cent confidence intervals that were closer to the advertised rate; (iii) narrower 95 per cent confidence intervals; and (iv) estimated standard errors that more closely reflected the sampling variability of the estimated risk difference. Differences between the empirical and advertised performance of methods for independent samples were greater when the treatment-selection process was stronger compared with when treatment-selection process was weaker. We recommend using statistical methods for paired samples when using propensity-score matched samples for making inferences on the effect of treatment on the reduction in the probability of an event occurring.
倾向评分匹配可用于减少在使用观察性数据估计治疗效果时治疗选择偏差或混杂因素的影响。一些作者建议,当使用倾向评分匹配进行治疗效果评估时,可以使用适用于独立样本的推理方法来评估治疗效果的统计显著性。实际上,应用医学文献中的许多作者在使用倾向评分匹配样本进行治疗效果推断时使用独立样本的方法。二项式结局在医疗保健研究中很常见。在这项研究中,我们使用蒙特卡罗模拟来检查当在倾向评分匹配的样本中使用独立样本的统计方法与使用配对样本的统计方法时,对风险差异(或绝对风险降低)推断的影响。我们发现,与使用独立样本的方法相比,使用配对样本的方法导致:(i)经验型 I 类错误率更接近广告率;(ii)95%置信区间的经验覆盖率更接近广告率;(iii)更窄的 95%置信区间;(iv)估计的标准误差更能反映估计风险差异的抽样变异性。当治疗选择过程比治疗选择过程更强时,独立样本方法的经验和广告表现之间的差异更大。当使用倾向评分匹配样本对治疗对事件发生概率降低的效果进行推断时,我们建议使用配对样本的统计方法。
Stat Med. 2018-10-22
Int J Biostat. 2010
Psychol Methods. 2008-12
J Clin Epidemiol. 2008-6