Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, U.S.A.
Stat Med. 2014 May 10;33(10):1685-99. doi: 10.1002/sim.6058. Epub 2013 Dec 9.
Inferring causation from non-randomized studies of exposure requires that exposure groups can be balanced with respect to prognostic factors for the outcome. Although there is broad agreement in the literature that balance should be checked, there is confusion regarding the appropriate metric. We present a simulation study that compares several balance metrics with respect to the strength of their association with bias in estimation of the effect of a binary exposure on a binary, count, or continuous outcome. The simulations utilize matching on the propensity score with successively decreasing calipers to produce datasets with varying covariate balance. We propose the post-matching C-statistic as a balance metric and found that it had consistently strong associations with estimation bias, even when the propensity score model was misspecified, as long as the propensity score was estimated with sufficient study size. This metric, along with the average standardized difference and the general weighted difference, outperformed all other metrics considered in association with bias, including the unstandardized absolute difference, Kolmogorov-Smirnov and Lévy distances, overlapping coefficient, Mahalanobis balance, and L1 metrics. Of the best-performing metrics, the C-statistic and general weighted difference also have the advantage that they automatically evaluate balance on all covariates simultaneously and can easily incorporate balance on interactions among covariates. Therefore, when combined with the usual practice of comparing individual covariate means and standard deviations across exposure groups, these metrics may provide useful summaries of the observed covariate imbalance.
从非随机暴露研究中推断因果关系要求暴露组在结局的预后因素方面可以平衡。尽管文献中广泛认为应该检查平衡,但对于合适的度量标准存在混淆。我们进行了一项模拟研究,比较了几种平衡度量标准与它们与估计二元暴露对二元、计数或连续结局的影响的偏倚之间的关联强度。这些模拟使用倾向评分匹配,然后使用逐渐减小的卡尺来产生具有不同协变量平衡的数据集。我们提出了后匹配 C 统计量作为一种平衡度量标准,并发现它与估计偏差具有一致的强关联,即使倾向评分模型存在错误指定,只要使用足够的研究大小来估计倾向评分。该度量标准以及平均标准化差异和一般加权差异,与其他所有考虑的与偏差相关的度量标准相比,表现都更好,包括未标准化的绝对差异、Kolmogorov-Smirnov 和 Lévy 距离、重叠系数、Mahalanobis 平衡和 L1 度量标准。在表现最好的度量标准中,C 统计量和一般加权差异也具有优势,它们可以自动同时评估所有协变量的平衡,并且可以轻松地将协变量之间的交互平衡纳入其中。因此,当与比较暴露组之间个别协变量均值和标准差的常用做法相结合时,这些度量标准可能会提供观察到的协变量不平衡的有用总结。