Shang Yimeng, Chiu Yu-Han, Kong Lan
Department of Public Health Sciences, College of Medicine, The Pennsylvania State University, Hershey, PA, USA.
Stat Methods Med Res. 2025 Mar;34(3):457-472. doi: 10.1177/09622802241308709. Epub 2025 Feb 12.
Propensity score estimation is often used as a preliminary step to estimate the average treatment effect with observational data. Nevertheless, misspecification of propensity score models undermines the validity of effect estimates in subsequent analyses. Prediction-based machine learning algorithms are increasingly used to estimate propensity scores to allow for more complex relationships between covariates. However, these approaches may not necessarily achieve covariates balancing. We propose a calibration-based method to better incorporate covariate balance properties in a general modeling framework. Specifically, we calibrate the loss function by adding a covariate imbalance penalty to standard parametric (e.g. logistic regressions) or machine learning models (e.g. neural networks). Our approach may mitigate the impact of model misspecification by explicitly taking into account the covariate balance in the propensity score estimation process. The empirical results show that the proposed method is robust to propensity score model misspecification. The integration of loss function calibration improves the balance of covariates and reduces the root-mean-square error of causal effect estimates. When the propensity score model is misspecified, the neural-network-based model yields the best estimator with less bias and smaller variance as compared to other methods considered.
倾向得分估计通常被用作利用观测数据估计平均治疗效果的初步步骤。然而,倾向得分模型的错误设定会破坏后续分析中效果估计的有效性。基于预测的机器学习算法越来越多地用于估计倾向得分,以考虑协变量之间更复杂的关系。然而,这些方法不一定能实现协变量平衡。我们提出一种基于校准的方法,以便在通用建模框架中更好地纳入协变量平衡特性。具体而言,我们通过向标准参数模型(如逻辑回归)或机器学习模型(如神经网络)添加协变量不平衡惩罚来校准损失函数。我们的方法可以通过在倾向得分估计过程中明确考虑协变量平衡来减轻模型错误设定的影响。实证结果表明,所提出的方法对倾向得分模型的错误设定具有鲁棒性。损失函数校准的整合改善了协变量的平衡,并降低了因果效应估计的均方根误差。当倾向得分模型被错误设定时,与其他考虑的方法相比,基于神经网络的模型产生的估计量偏差更小、方差更小。