Dai Ying, Ma Shuangge
School of Public Health, Yale University, New Haven, CT, USA.
J Nonparametr Stat. 2012 Jun 1;24(2):283-298. doi: 10.1080/10485252.2012.661054. Epub 2012 Apr 30.
Semiparametric regression models with multiple covariates are commonly encountered. When there are covariates not associated with response variable, variable selection may lead to sparser models, more lucid interpretations and more accurate estimation. In this study, we adopt a sieve approach for the estimation of nonparametric covariate effects in semiparametric regression models. We adopt a two-step iterated penalization approach for variable selection. In the first step, a mixture of the Lasso and group Lasso penalties are employed to conduct the first-round variable selection and obtain the initial estimate. In the second step, a mixture of the weighted Lasso and weighted group Lasso penalties, with weights constructed using the initial estimate, are employed for variable selection. We show that the proposed iterated approach has the variable selection consistency property, even when number of unknown parameters diverges with sample size. Numerical studies, including simulation and analysis of a diabetes dataset, show satisfactory performance of the proposed approach.
具有多个协变量的半参数回归模型很常见。当存在与响应变量无关的协变量时,变量选择可能会导致模型更稀疏、解释更清晰且估计更准确。在本研究中,我们采用一种筛法来估计半参数回归模型中的非参数协变量效应。我们采用两步迭代惩罚法进行变量选择。第一步,使用Lasso和组Lasso惩罚的混合来进行第一轮变量选择并获得初始估计。第二步,使用基于初始估计构建权重的加权Lasso和加权组Lasso惩罚的混合进行变量选择。我们表明,即使未知参数的数量随样本量发散,所提出的迭代方法也具有变量选择一致性属性。数值研究,包括对糖尿病数据集的模拟和分析,表明所提出方法具有令人满意的性能。