Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario, M4N 3M5 Canada.
Med Decis Making. 2009 Nov-Dec;29(6):661-77. doi: 10.1177/0272989X09341755. Epub 2009 Aug 14.
The propensity score is a balancing score: conditional on the propensity score, treated and untreated subjects have the same distribution of observed baseline characteristics. Four methods of using the propensity score have been described in the literature: stratification on the propensity score, propensity score matching, inverse probability of treatment weighting using the propensity score, and covariate adjustment using the propensity score. However, the relative ability of these methods to reduce systematic differences between treated and untreated subjects has not been examined. The authors used an empirical case study and Monte Carlo simulations to examine the relative ability of the 4 methods to balance baseline covariates between treated and untreated subjects. They used standardized differences in the propensity score matched sample and in the weighted sample. For stratification on the propensity score, within-quintile standardized differences were computed comparing the distribution of baseline covariates between treated and untreated subjects within the same quintile of the propensity score. These quintile-specific standardized differences were then averaged across the quintiles. For covariate adjustment, the authors used the weighted conditional standardized absolute difference to compare balance between treated and untreated subjects. In both the empirical case study and in the Monte Carlo simulations, they found that matching on the propensity score and weighting using the inverse probability of treatment eliminated a greater degree of the systematic differences between treated and untreated subjects compared with the other 2 methods. In the Monte Carlo simulations, propensity score matching tended to have either comparable or marginally superior performance compared with propensity-score weighting.
在倾向评分的条件下,处理组和未处理组具有相同的观测基线特征分布。文献中描述了使用倾向评分的 4 种方法:倾向评分分层、倾向评分匹配、使用倾向评分的治疗反概率加权和使用倾向评分的协变量调整。然而,这些方法在减少处理组和未处理组之间系统差异的相对能力尚未得到检验。作者使用实证案例研究和蒙特卡罗模拟来检验这 4 种方法在平衡处理组和未处理组的基线协变量方面的相对能力。他们使用倾向评分匹配样本和加权样本中的倾向评分标准化差异。对于倾向评分分层,在相同倾向评分五分位数内,通过比较处理组和未处理组在倾向评分五分位数内的基线协变量分布,计算了倾向评分匹配样本中的五分位数特定标准化差异。然后将这些五分位数特定的标准化差异在五分位数之间进行平均。对于协变量调整,作者使用加权条件标准化绝对差异来比较处理组和未处理组之间的平衡。在实证案例研究和蒙特卡罗模拟中,他们发现,与其他 2 种方法相比,倾向评分匹配和使用治疗反概率加权可以消除处理组和未处理组之间更大程度的系统差异。在蒙特卡罗模拟中,倾向评分匹配的性能往往与倾向评分加权相当或略有优势。