Cannas Massimo, Arpino Bruno
Department of Economic and Business Sciences, University of Cagliari, Cagliari, Italy.
Department of Statistics, Computer Science, Applications, University of Firenze, Firenze, Italy.
Biom J. 2019 Jul;61(4):1049-1072. doi: 10.1002/bimj.201800132. Epub 2019 May 14.
Propensity score matching (PSM) and propensity score weighting (PSW) are popular tools to estimate causal effects in observational studies. We address two open issues: how to estimate propensity scores and assess covariate balance. Using simulations, we compare the performance of PSM and PSW based on logistic regression and machine learning algorithms (CART; Bagging; Boosting; Random Forest; Neural Networks; naive Bayes). Additionally, we consider several measures of covariate balance (Absolute Standardized Average Mean (ASAM) with and without interactions; measures based on the quantile-quantile plots; ratio between variances of propensity scores; area under the curve (AUC)) and assess their ability in predicting the bias of PSM and PSW estimators. We also investigate the importance of tuning of machine learning parameters in the context of propensity score methods. Two simulation designs are employed. In the first, the generating processes are inspired to birth register data used to assess the effect of labor induction on the occurrence of caesarean section. The second exploits more general generating mechanisms. Overall, among the different techniques, random forests performed the best, especially in PSW. Logistic regression and neural networks also showed an excellent performance similar to that of random forests. As for covariate balance, the simplest and commonly used metric, the ASAM, showed a strong correlation with the bias of causal effects estimators. Our findings suggest that researchers should aim at obtaining an ASAM lower than 10% for as many variables as possible. In the empirical study we found that labor induction had a small and not statistically significant impact on caesarean section.
倾向得分匹配(PSM)和倾向得分加权(PSW)是在观察性研究中估计因果效应的常用工具。我们解决两个未解决的问题:如何估计倾向得分以及评估协变量平衡。通过模拟,我们比较了基于逻辑回归和机器学习算法(分类与回归树;装袋法;提升法;随机森林;神经网络;朴素贝叶斯)的PSM和PSW的性能。此外,我们考虑了几种协变量平衡的度量方法(有无交互作用的绝对标准化平均均值(ASAM);基于分位数-分位数图的度量方法;倾向得分方差之间的比率;曲线下面积(AUC)),并评估它们预测PSM和PSW估计量偏差的能力。我们还研究了在倾向得分方法背景下调整机器学习参数的重要性。采用了两种模拟设计。第一种设计中,生成过程的灵感来源于用于评估引产对剖宫产发生率影响的出生登记数据。第二种设计利用了更一般的生成机制。总体而言,在不同技术中,随机森林表现最佳,尤其是在PSW中。逻辑回归和神经网络也表现出与随机森林相似的出色性能。至于协变量平衡,最简单且常用的度量指标ASAM与因果效应估计量的偏差显示出很强的相关性。我们的研究结果表明,研究人员应尽可能使尽可能多的变量的ASAM低于10%。在实证研究中,我们发现引产对剖宫产的影响较小且无统计学意义。