Hao Ning, Dong Bin, Fan Jianqing
University of Arizona, University of Arizona, and Princeton University.
J R Stat Soc Series B Stat Methodol. 2015 Sep 1;77(4):827-851. doi: 10.1111/rssb.12092. Epub 2014 Nov 7.
Many high dimensional classification techniques have been proposed in the literature based on sparse linear discriminant analysis (LDA). To efficiently use them, sparsity of linear classifiers is a prerequisite. However, this might not be readily available in many applications, and rotations of data are required to create the needed sparsity. In this paper, we propose a family of rotations to create the required sparsity. The basic idea is to use the principal components of the sample covariance matrix of the pooled samples and its variants to rotate the data first and to then apply an existing high dimensional classifier. This rotate-and-solve procedure can be combined with any existing classifiers, and is robust against the sparsity level of the true model. We show that these rotations do create the sparsity needed for high dimensional classifications and provide theoretical understanding why such a rotation works empirically. The effectiveness of the proposed method is demonstrated by a number of simulated and real data examples, and the improvements of our method over some popular high dimensional classification rules are clearly shown.
文献中基于稀疏线性判别分析(LDA)提出了许多高维分类技术。为了有效利用这些技术,线性分类器的稀疏性是一个前提条件。然而,在许多应用中这可能无法轻易实现,需要对数据进行旋转以创建所需的稀疏性。在本文中,我们提出了一系列旋转方法来创建所需的稀疏性。基本思想是首先使用合并样本的样本协方差矩阵及其变体的主成分来旋转数据,然后应用现有的高维分类器。这种旋转并求解的过程可以与任何现有的分类器相结合,并且对真实模型的稀疏水平具有鲁棒性。我们表明这些旋转确实创建了高维分类所需的稀疏性,并从理论上解释了为什么这样的旋转在经验上有效。通过大量模拟和真实数据示例证明了所提出方法的有效性,并且清楚地展示了我们的方法相对于一些流行的高维分类规则的改进。