Wasserman Larry, Roeder Kathryn
Department of Statistics, Carnegie Mellon University, Pittsburgh, E-mail:
Ann Stat. 2009 Jan 1;37(5A):2178-2201. doi: 10.1214/08-aos646.
This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as "screening" and the last stage as "cleaning." We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.
在高维模型中进行变量选择时,可以给出什么样的统计保证?特别是,我们研究了一些多阶段回归方法的错误率和功效。在第一阶段,我们拟合一组候选模型。在第二阶段,我们通过交叉验证选择一个模型。在第三阶段,我们使用假设检验来消除一些变量。我们将前两个阶段称为“筛选”,最后一个阶段称为“清理”。我们考虑三种筛选方法:套索回归、边际回归和前向逐步回归。我们的方法在某些条件下能给出一致的变量选择。