Wu Shuang, Xue Hongqi, Wu Yichao, Wu Hulin
Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY.
Department of Statistics, North Carolina State University, Raleigh, NC.
Stat Sin. 2014 Jul;24(3):1365-1387. doi: 10.5705/ss.2012.316.
In many regression problems, the relations between the covariates and the response may be nonlinear. Motivated by the application of reconstructing a gene regulatory network, we consider a sparse high-dimensional additive model with the additive components being some known nonlinear functions with unknown parameters. To identify the subset of important covariates, we propose a new method for simultaneous variable selection and parameter estimation by iteratively combining a large-scale variable screening (the nonlinear independence screening, NLIS) and a moderate-scale model selection (the nonnegative garrote, NNG) for the nonlinear additive regressions. We have shown that the NLIS procedure possesses the sure screening property and it is able to handle problems with non-polynomial dimensionality; and for finite dimension problems, the NNG for the nonlinear additive regressions has selection consistency for the unimportant covariates and also estimation consistency for the parameter estimates of the important covariates. The proposed method is applied to simulated data and a real data example for identifying gene regulations to illustrate its numerical performance.
在许多回归问题中,协变量与响应之间的关系可能是非线性的。受基因调控网络重建应用的启发,我们考虑一种稀疏高维加法模型,其加法分量是一些具有未知参数的已知非线性函数。为了识别重要协变量的子集,我们提出了一种新方法,通过迭代地将大规模变量筛选(非线性独立性筛选,NLIS)和适度规模模型选择(非负截尾,NNG)相结合,用于非线性加法回归的同时变量选择和参数估计。我们已经证明,NLIS过程具有确定筛选性质,并且能够处理非多项式维度的问题;对于有限维度问题,非线性加法回归的NNG对于不重要的协变量具有选择一致性,对于重要协变量的参数估计也具有估计一致性。所提出的方法应用于模拟数据和一个识别基因调控的实际数据示例,以说明其数值性能。