Antonelli Joseph, Parmigiani Giovanni, Dominici Francesca
Department of Statistics, University of Florida, 102 Griffin-Floyd Hall, P.O. Box 118545, Gainesville, Fl, 32611, USA.
Department of Biostatistics and Computational Biology, CLS 11007, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, 02215, USA.
Bayesian Anal. 2019 Sep;14(3):805-828. doi: 10.1214/18-ba1131. Epub 2019 Jun 11.
In observational studies, estimation of a causal effect of a treatment on an outcome relies on proper adjustment for confounding. If the number of the potential confounders () is larger than the number of observations (), then direct control for all potential confounders is infeasible. Existing approaches for dimension reduction and penalization are generally aimed at predicting the outcome, and are less suited for estimation of causal effects. Under standard penalization approaches (e.g. Lasso), if a variable is strongly associated with the treatment but weakly with the outcome , the coefficient will be shrunk towards zero thus leading to confounding bias. Under the assumption of a linear model for the outcome and sparsity, we propose continuous spike and slab priors on the regression coefficients corresponding to the potential confounders . Specifically, we introduce a prior distribution that does not heavily shrink to zero the coefficients ( s) of the s that are strongly associated with but weakly associated with . We compare our proposed approach to several state of the art methods proposed in the literature. Our proposed approach has the following features: 1) it reduces confounding bias in high dimensional settings; 2) it shrinks towards zero coefficients of instrumental variables; and 3) it achieves good coverages even in small sample sizes. We apply our approach to the National Health and Nutrition Examination Survey (NHANES) data to estimate the causal effects of persistent pesticide exposure on triglyceride levels.
在观察性研究中,对治疗对结果的因果效应进行估计依赖于对混杂因素的适当调整。如果潜在混杂因素的数量()大于观察值的数量(),那么直接控制所有潜在混杂因素是不可行的。现有的降维和惩罚方法通常旨在预测结果,不太适合估计因果效应。在标准惩罚方法(如套索回归)下,如果一个变量与治疗强烈相关但与结果弱相关,系数将向零收缩,从而导致混杂偏差。在结果的线性模型和稀疏性假设下,我们针对与潜在混杂因素对应的回归系数提出连续尖峰和平板先验。具体来说,我们引入一种先验分布,该分布不会将与强烈相关但与弱相关的的系数()严重收缩至零。我们将我们提出的方法与文献中提出的几种先进方法进行比较。我们提出的方法具有以下特点:1)它在高维设置中减少混杂偏差;2)它将工具变量的系数向零收缩;3)即使在小样本量下也能实现良好的覆盖率。我们将我们的方法应用于国家健康和营养检查调查(NHANES)数据,以估计持续接触农药对甘油三酯水平的因果效应。