Storlie Curtis B, Bondell Howard D, Reich Brian J, Zhang Hao Helen
Stat Sin. 2011 Apr;21(2):679-705. doi: 10.5705/ss.2011.030a.
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
多元非参数回归中的变量选择是一个重要但具有挑战性的问题,部分原因在于函数空间的无限维特性。一个理想的选择过程应该是自动的、稳定的、易于使用的,并且具有理想的渐近性质。特别地,如果一个选择过程在样本量趋于无穷时能够一致地选择正确的预测变量子集,同时以最优的非参数速率估计光滑曲面,我们就将其定义为非参数神谕(np - 神谕)。在本文中,我们提出了一种非参数模型的模型选择过程,并探讨了新方法具有上述性质的条件。我们的估计器是在平滑样条方差分析的框架下开发的,通过求解一个对函数分量范数之和施加新颖自适应惩罚的正则化问题得到。建立了新估计器的理论性质。此外,大量的模拟和实际例子进一步表明,在有限样本情况下,新方法显著优于其他现有方法。