Wei Fengrong, Huang Jian, Li Hongzhe
Department of Mathematics, University of West Georgia, 1601 Maple Street, Carrollton, GA 30118, USA.
Stat Sin. 2011 Oct 1;21(4):1515-1540. doi: 10.5705/ss.2009.316.
Nonparametric varying coefficient models are useful for studying the time-dependent effects of variables. Many procedures have been developed for estimation and variable selection in such models. However, existing work has focused on the case when the number of variables is fixed or smaller than the sample size. In this paper, we consider the problem of variable selection and estimation in varying coefficient models in sparse, high-dimensional settings when the number of variables can be larger than the sample size. We apply the group Lasso and basis function expansion to simultaneously select the important variables and estimate the nonzero varying coefficient functions. Under appropriate conditions, we show that the group Lasso selects a model of the right order of dimensionality, selects all variables with the norms of the corresponding coefficient functions greater than certain threshold level, and is estimation consistent. However, the group Lasso is in general not selection consistent and tends to select variables that are not important in the model. In order to improve the selection results, we apply the adaptive group Lasso. We show that, under suitable conditions, the adaptive group Lasso has the oracle selection property in the sense that it correctly selects important variables with probability converging to one. In contrast, the group Lasso does not possess such oracle property. Both approaches are evaluated using simulation and demonstrated on a data example.
非参数变系数模型对于研究变量的时间依存效应很有用。针对此类模型中的估计和变量选择,已经开发了许多方法。然而,现有工作主要集中在变量数量固定或小于样本量的情况。在本文中,我们考虑稀疏、高维情形下变系数模型的变量选择和估计问题,此时变量数量可能大于样本量。我们应用组套索和基函数展开来同时选择重要变量并估计非零变系数函数。在适当条件下,我们证明组套索能选择维度正确的模型,能选择所有对应系数函数范数大于特定阈值水平的变量,并且估计是一致的。然而,组套索一般不具有选择一致性,并且倾向于选择在模型中不重要的变量。为了改进选择结果,我们应用自适应组套索。我们证明,在合适条件下,自适应组套索具有似然选择性质,即它能以收敛到1的概率正确选择重要变量。相比之下,组套索不具有这种似然性质。两种方法都通过模拟进行评估,并在一个数据实例上进行了演示。