Thomas Laine E, Thomas Steven M, Li Fan, Matsouaka Roland A
Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, USA.
AstraZeneca, Gaithersberg, USA.
Epidemiol Methods. 2023 Nov 13;12(1):20220131. doi: 10.1515/em-2022-0131. eCollection 2023 Jan.
Propensity score (PS) weighting methods are commonly used to adjust for confounding in observational treatment comparisons. However, in the setting of substantial covariate imbalance, PS values may approach 0 and 1, yielding extreme weights and inflated variance of the estimated treatment effect. Adaptations of the standard inverse probability of treatment weights (IPTW) can reduce the influence of extremes, including trimming methods that exclude people with PS values near 0 or 1. Alternatively, overlap weighting (OW) optimizes criteria related to bias and variance, and performs well compared to other PS weighting and matching methods. However, it has not been compared to propensity score stratification (PSS). PSS has some of the same potential advantages; being insensitive extreme values. We sought to compare these methods in the setting of substantial covariate imbalance to generate practical recommendations.
Analytical derivations were used to establish connections between methods, and simulation studies were conducted to assess bias and variance of alternative methods.
We find that OW is generally superior, particularly as covariate imbalance increases. In addition, a common method for implementing PSS based on Mantel-Haenszel weights (PSS-MH) is equivalent to a coarsened version of OW and can perform nearly as well. Finally, trimming methods increase bias across methods (IPTW, PSS and PSS-MH) unless the PS model is re-fit to the trimmed sample and weights or strata are re-derived. After trimming with re-fitting, all methods perform similarly to OW.
These results may guide the selection, implementation and reporting of PS methods for observational studies with substantial covariate imbalance.
倾向评分(PS)加权方法常用于观察性治疗比较中的混杂因素调整。然而,在协变量严重失衡的情况下,PS值可能接近0和1,导致权重极端且估计治疗效果的方差膨胀。标准治疗权重逆概率(IPTW)的改进方法可以减少极端值的影响,包括排除PS值接近0或1的人群的修剪方法。或者,重叠加权(OW)优化了与偏差和方差相关的标准,与其他PS加权和匹配方法相比表现良好。然而,它尚未与倾向评分分层(PSS)进行比较。PSS具有一些相同的潜在优势,即对极端值不敏感。我们试图在协变量严重失衡的情况下比较这些方法,以提出实用建议。
使用分析推导来建立方法之间的联系,并进行模拟研究以评估替代方法的偏差和方差。
我们发现OW通常更优,尤其是随着协变量失衡的增加。此外,一种基于Mantel-Haenszel权重实施PSS的常用方法(PSS-MH)等同于OW的粗化版本,并且表现几乎同样出色。最后,修剪方法会增加所有方法(IPTW、PSS和PSS-MH)的偏差,除非对修剪后的样本重新拟合PS模型并重新推导权重或分层。重新拟合后进行修剪,所有方法的表现与OW相似。
这些结果可能为协变量严重失衡的观察性研究中PS方法的选择、实施和报告提供指导。