Xue Jingnan, Liang Faming
Department of Statistics, Texas A&M University, College Station, TX 77843.
Department of Biostatistics, University of Florida, Gainesville, FL 32611.
J Comput Graph Stat. 2017;26(4):803-813. doi: 10.1080/10618600.2017.1328364. Epub 2017 Oct 9.
Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this paper, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze-Zirkler's test; that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze-Zirkler's test. The proposed method enjoys at least two merits. First, it is model-free, which avoids the specification of a particular model structure. Second, it is condition-free, which does not require any extra conditions except for some regularity conditions for high-dimensional feature screening. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. The proposed method is applied to screening of anticancer drug response genes.
特征筛选在超高维数据降维中起着重要作用。在本文中,我们介绍了一种新的特征筛选方法,并在超高维设置下建立了其确定独立性筛选性质。所提出的方法基于非正态变换和亨泽 - 齐克勒检验;也就是说,它首先使用非正态变换将响应变量和特征转换为高斯随机变量,然后使用亨泽 - 齐克勒检验来检验响应变量和特征之间的依赖性。所提出的方法至少具有两个优点。首先,它是无模型的,这避免了特定模型结构的指定。其次,它是无条件的,除了一些用于高维特征筛选的正则条件外,不需要任何额外条件。数值结果表明,与现有方法相比,所提出的方法对来自重尾分布和/或具有交互变量的复杂模型生成的数据更具鲁棒性。所提出的方法应用于抗癌药物反应基因的筛选。