Deshpande Shachi, Kuleshov Volodymyr
Dept of Computer Science, Cornell University and Cornell Tech, New York, NY, USA.
Uncertain Artif Intell. 2024 Jul;2024:1083-1111. Epub 2024 Jul 15.
Propensity scores are commonly used to estimate treatment effects from observational data. We argue that the probabilistic output of a learned propensity score model should be calibrated-i.e., a predictive treatment probability of 90% should correspond to 90% individuals being assigned the treatment group-and we propose simple recalibration techniques to ensure this property. We prove that calibration is a necessary condition for unbiased treatment effect estimation when using popular inverse propensity weighted and doubly robust estimators. We derive error bounds on causal effect estimates that directly relate to the quality of uncertainties provided by the probabilistic propensity score model and show that calibration strictly improves this error bound while also avoiding extreme propensity weights. We demonstrate improved causal effect estimation with calibrated propensity scores in several tasks including high-dimensional image covariates and genome-wide association studies (GWASs). Calibrated propensity scores improve the speed of GWAS analysis by more than two-fold by enabling the use of simpler models that are faster to train.
倾向得分通常用于从观察性数据中估计治疗效果。我们认为,学习到的倾向得分模型的概率输出应该进行校准——也就是说,预测的治疗概率为90%应该对应于90%的个体被分配到治疗组——并且我们提出了简单的重新校准技术来确保这一特性。我们证明,当使用流行的逆倾向加权和双重稳健估计器时,校准是无偏治疗效果估计的必要条件。我们推导了因果效应估计的误差界,该误差界直接与概率倾向得分模型提供的不确定性质量相关,并表明校准严格改善了这个误差界,同时还避免了极端的倾向权重。我们在包括高维图像协变量和全基因组关联研究(GWAS)在内的多个任务中展示了使用校准后的倾向得分进行改进的因果效应估计。校准后的倾向得分通过启用更快训练的更简单模型,将GWAS分析的速度提高了两倍多。