Wager Stefan, Du Wenfei, Taylor Jonathan, Tibshirani Robert J
Department of Statistics, Stanford University, Stanford, CA 94305;
Operations, Information & Technology, Stanford Graduate School of Business, Stanford University, Stanford, CA 94305.
Proc Natl Acad Sci U S A. 2016 Nov 8;113(45):12673-12678. doi: 10.1073/pnas.1614732113. Epub 2016 Oct 25.
We study the problem of treatment effect estimation in randomized experiments with high-dimensional covariate information and show that essentially any risk-consistent regression adjustment can be used to obtain efficient estimates of the average treatment effect. Our results considerably extend the range of settings where high-dimensional regression adjustments are guaranteed to provide valid inference about the population average treatment effect. We then propose cross-estimation, a simple method for obtaining finite-sample-unbiased treatment effect estimates that leverages high-dimensional regression adjustments. Our method can be used when the regression model is estimated using the lasso, the elastic net, subset selection, etc. Finally, we extend our analysis to allow for adaptive specification search via cross-validation and flexible nonparametric regression adjustments with machine-learning methods such as random forests or neural networks.
我们研究了具有高维协变量信息的随机实验中的治疗效果估计问题,并表明基本上任何风险一致的回归调整都可用于获得平均治疗效果的有效估计。我们的结果大大扩展了高维回归调整能够保证提供关于总体平均治疗效果有效推断的设置范围。然后,我们提出了交叉估计,这是一种利用高维回归调整来获得有限样本无偏治疗效果估计的简单方法。当使用套索回归、弹性网络、子集选择等方法估计回归模型时,我们的方法都可以使用。最后,我们扩展了分析,以允许通过交叉验证进行自适应规范搜索,以及使用随机森林或神经网络等机器学习方法进行灵活的非参数回归调整。