Yu Guan, Liu Yufeng
Guan Yu is Ph.D. Candidate, Department of Statistics and Operations Research. Yufeng Liu is Professor, Department of Statistics and Operations Research, Carolina Center for Genome Science, Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599.
J Am Stat Assoc. 2016;111(514):707-720. doi: 10.1080/01621459.2015.1034319. Epub 2016 Aug 18.
With the abundance of high dimensional data in various disciplines, sparse regularized techniques are very popular these days. In this paper, we make use of the structure information among predictors to improve sparse regression models. Typically, such structure information can be modeled by the connectivity of an undirected graph using all predictors as nodes of the graph. Most existing methods use this undirected graph edge-by-edge to encourage the regression coefficients of corresponding connected predictors to be similar. However, such methods do not directly utilize the neighborhood information of the graph. Furthermore, if there are more edges in the predictor graph, the corresponding regularization term will be more complicate. In this paper, we incorporate the graph information node-by-node, instead of edge-by-edge as used in most existing methods. Our proposed method is very general and it includes adaptive Lasso, group Lasso, and ridge regression as special cases. Both theoretical and numerical studies demonstrate the effectiveness of the proposed method for simultaneous estimation, prediction and model selection.
随着各学科中高维数据的丰富,稀疏正则化技术如今非常流行。在本文中,我们利用预测变量之间的结构信息来改进稀疏回归模型。通常,这种结构信息可以通过使用所有预测变量作为图的节点的无向图的连通性来建模。大多数现有方法逐边使用此无向图来促使相应相连预测变量的回归系数相似。然而,此类方法并未直接利用图的邻域信息。此外,如果预测变量图中的边更多,相应的正则化项将更复杂。在本文中,我们逐节点合并图信息,而不是像大多数现有方法那样逐边合并。我们提出的方法非常通用,它包括自适应Lasso、分组Lasso和岭回归作为特殊情况。理论和数值研究均证明了所提方法在同时估计、预测和模型选择方面的有效性。