Leite Walter L, Aydin Burak, Cetin-Berber Dee D
University of Florida, Gainesville, FL, USA.
Ege University, Izmir, Turkey.
Eval Rev. 2021 Feb-Apr;45(1-2):34-69. doi: 10.1177/0193841X211020245. Epub 2021 Jun 22.
Propensity score analysis (PSA) is a popular method to remove selection bias due to covariates in quasi-experimental designs, but it requires handling of missing data on covariates before propensity scores are estimated. Multiple imputation (MI) and single imputation (SI) are approaches to handle missing data in PSA.
The objectives of this study are to review MI-within, MI-across, and SI approaches to handle missing data on covariates prior to PSA, investigate the robustness of MI-across and SI with a Monte Carlo simulation study, and demonstrate the analysis of missing data and PSA with a step-by-step illustrative example.
The Monte Carlo simulation study compared strategies to impute missing data in continuous and categorical covariates for estimation of propensity scores. Manipulated conditions included sample size, the number of covariates, the size of the treatment effect, missing data mechanism, and percentage of missing data. Imputation strategies included MI-across and SI by joint modeling or multivariate imputation by chained equations (MICE).
The results indicated that the MI-across method performed well, and SI also performed adequately with smaller percentages of missing data. The illustrative example demonstrated MI and SI, propensity score estimation, calculation of propensity score weights, covariate balance evaluation, estimation of the average treatment effect on the treated, and sensitivity analysis using data from the National Longitudinal Survey of Youth.
倾向得分分析(PSA)是一种在准实验设计中用于消除因协变量导致的选择偏倚的常用方法,但在估计倾向得分之前需要处理协变量的缺失数据。多重填补(MI)和单一填补(SI)是在PSA中处理缺失数据的方法。
本研究的目的是回顾在PSA之前处理协变量缺失数据的MI内部、MI跨层和SI方法,通过蒙特卡洛模拟研究调查MI跨层和SI的稳健性,并通过一个逐步的示例演示缺失数据和PSA的分析。
蒙特卡洛模拟研究比较了在连续和分类协变量中填补缺失数据以估计倾向得分的策略。控制条件包括样本量、协变量数量、治疗效果大小、缺失数据机制和缺失数据百分比。填补策略包括通过联合建模的MI跨层和SI或链式方程多元填补(MICE)。
结果表明,MI跨层方法表现良好,SI在缺失数据百分比较小时也表现良好。示例演示了MI和SI、倾向得分估计、倾向得分权重计算、协变量平衡评估、对治疗组平均治疗效果的估计以及使用全国青年纵向调查数据进行的敏感性分析。