Huang Jian, Zhang Cun-Hui
Department of Statistics and Actuarial Science, University of Iowa, Iowa City, IA 52242, USA.
Department of Statistics and Biostatistics, Rutgers University, Piscataway, New Jersey 08854, USA.
J Mach Learn Res. 2012 Jun 1;13:1839-1864.
The ℓ-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ-penalized estimator in sparse, high-dimensional settings where the number of predictors can be much larger than the sample size . Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results.
ℓ 惩罚方法,即套索(Lasso)法,已成为分析大数据集的重要工具。在线性回归中,套索法已取得许多重要成果,这些成果加深了我们对高维统计问题的理解。在本文中,我们考虑一类用于一般形式凸损失函数的加权 ℓ 惩罚估计量,包括广义线性模型。我们研究加权 ℓ 惩罚估计量在稀疏、高维情形下的估计、预测、选择和稀疏性性质,其中预测变量的数量可能远大于样本量。自适应套索法被视为一种特殊情况。我们开发了一种多阶段方法,通过递归应用自适应套索法来近似凹正则化估计。我们给出了单阶段和多阶段估计量的预测和估计的神谕不等式、一个一般的选择一致性定理,以及套索估计量维度的一个上界。贯穿全文使用了包括线性回归、逻辑回归和对数线性模型在内的重要模型来说明一般结果的应用。