Zhu Liping, Li Lexin, Li Runze, Zhu Lixing
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.
随着近期规模空前且复杂度极高的科学数据激增,特征排序与筛选在许多科学研究中发挥着越来越重要的作用。在本文中,我们在一个统一的模型框架下提出了一种新颖的特征筛选方法,该框架涵盖了多种常用的参数模型和半参数模型。新方法无需对回归函数施加特定的模型结构,因此对于超高维回归特别有吸引力,在超高维回归中存在大量候选预测变量,但关于实际模型形式的信息却很少。我们证明,随着预测变量数量以样本量的指数速率增长,所提出的方法在排序上具有一致性,这本身就很有用,并且能带来选择上的一致性。新方法计算效率高且简单,在我们密集的模拟和实际数据分析中展现出了出色的实证性能。