Cecil G. Sheps Center for Health Services Research, University of North Carolina at Chapel Hill, NC 27599, USA.
Pharmacoepidemiol Drug Saf. 2013 Feb;22(2):138-44. doi: 10.1002/pds.3396. Epub 2012 Dec 28.
The choice of propensity score (PS) implementation influences treatment effect estimates not only because different methods estimate different quantities, but also because different estimators respond in different ways to phenomena such as treatment effect heterogeneity and limited availability of potential matches. Using effectiveness data, we describe lessons learned from sensitivity analyses with matched and weighted estimates.
With subsample data (N = 1292) from Sequenced Treatment Alternatives to Relieve Depression, a 2001-2004 effectiveness trial of depression treatments, we implemented PS matching and weighting to estimate the treatment effect in the treated and conducted multiple sensitivity analyses.
Matching and weighting both balanced covariates but yielded different samples and treatment effect estimates (matched RR 1.00, 95% CI: 0.75-1.34; weighted RR 1.28, 95% CI: 0.97-1.69). In sensitivity analyses, as increasing numbers of observations at both ends of the PS distribution were excluded from the weighted analysis, weighted estimates approached the matched estimate (weighted RR 1.04, 95% CI 0.77-1.39 after excluding all observations below the 5th percentile of the treated and above the 95th percentile of the untreated). Treatment appeared to have benefits only in the highest and lowest PS strata.
Matched and weighted estimates differed due to incomplete matching, sensitivity of weighted estimates to extreme observations, and possibly treatment effect heterogeneity. PS analysis requires identifying the population and treatment effect of interest, selecting an appropriate implementation method, and conducting and reporting sensitivity analyses. Weighted estimation especially should include sensitivity analyses relating to influential observations, such as those treated contrary to prediction.
倾向评分(PS)的实施选择不仅会影响治疗效果估计,因为不同的方法估计不同的量,还会因为不同的估计器对治疗效果异质性和潜在匹配的有限可用性等现象的反应方式不同。使用有效性数据,我们描述了来自匹配和加权估计的敏感性分析中获得的经验教训。
使用来自 2001-2004 年抑郁症治疗有效性试验——序贯治疗选择缓解抑郁(Sequenced Treatment Alternatives to Relieve Depression)的子样本数据(N=1292),我们实施了 PS 匹配和加权来估计治疗组中的治疗效果,并进行了多次敏感性分析。
匹配和加权都平衡了协变量,但产生了不同的样本和治疗效果估计(匹配 RR 1.00,95%CI:0.75-1.34;加权 RR 1.28,95%CI:0.97-1.69)。在敏感性分析中,随着 PS 分布两端的观测值数量不断增加,加权分析中排除了加权估计值越来越接近匹配估计值(在排除了治疗组第 5 百分位以下和未治疗组第 95 百分位以上的所有观测值后,加权 RR 1.04,95%CI 0.77-1.39)。治疗似乎只在 PS 最高和最低分层中具有益处。
由于不完全匹配、加权估计对极端观测值的敏感性以及可能存在治疗效果异质性,匹配和加权估计值存在差异。PS 分析需要确定感兴趣的人群和治疗效果,选择适当的实施方法,并进行和报告敏感性分析。加权估计特别是应该包括与有影响力的观测值相关的敏感性分析,例如那些与预测相悖的治疗。