Suppr超能文献

一种用于超高维数据的稳健无模型特征筛选方法。

A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data.

作者信息

Xue Jingnan, Liang Faming

机构信息

Department of Statistics, Texas A&M University, College Station, TX 77843.

Department of Biostatistics, University of Florida, Gainesville, FL 32611.

出版信息

J Comput Graph Stat. 2017;26(4):803-813. doi: 10.1080/10618600.2017.1328364. Epub 2017 Oct 9.

Abstract

Feature screening plays an important role in dimension reduction for ultrahigh-dimensional data. In this paper, we introduce a new feature screening method and establish its sure independence screening property under the ultrahigh-dimensional setting. The proposed method works based on the nonparanormal transformation and Henze-Zirkler's test; that is, it first transforms the response variable and features to Gaussian random variables using the nonparanormal transformation and then tests the dependence between the response variable and features using the Henze-Zirkler's test. The proposed method enjoys at least two merits. First, it is model-free, which avoids the specification of a particular model structure. Second, it is condition-free, which does not require any extra conditions except for some regularity conditions for high-dimensional feature screening. The numerical results indicate that, compared to the existing methods, the proposed method is more robust to the data generated from heavy-tailed distributions and/or complex models with interaction variables. The proposed method is applied to screening of anticancer drug response genes.

摘要

特征筛选在超高维数据降维中起着重要作用。在本文中,我们介绍了一种新的特征筛选方法,并在超高维设置下建立了其确定独立性筛选性质。所提出的方法基于非正态变换和亨泽 - 齐克勒检验;也就是说,它首先使用非正态变换将响应变量和特征转换为高斯随机变量,然后使用亨泽 - 齐克勒检验来检验响应变量和特征之间的依赖性。所提出的方法至少具有两个优点。首先,它是无模型的,这避免了特定模型结构的指定。其次,它是无条件的,除了一些用于高维特征筛选的正则条件外,不需要任何额外条件。数值结果表明,与现有方法相比,所提出的方法对来自重尾分布和/或具有交互变量的复杂模型生成的数据更具鲁棒性。所提出的方法应用于抗癌药物反应基因的筛选。

相似文献

1
A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data.
J Comput Graph Stat. 2017;26(4):803-813. doi: 10.1080/10618600.2017.1328364. Epub 2017 Oct 9.
2
Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.
Sci China Math. 2017 Mar;60(3):551-568. doi: 10.1007/s11425-016-0186-8. Epub 2016 Dec 29.
3
Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.
J Am Stat Assoc. 2015 Jun 1;110(510):630-641. doi: 10.1080/01621459.2014.920256.
4
Feature Screening via Distance Correlation Learning.
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
5
Feature Screening for High-Dimensional Variable Selection in Generalized Linear Models.
Entropy (Basel). 2023 May 26;25(6):851. doi: 10.3390/e25060851.
6
Group Feature Screening via the F Statistic.
Commun Stat Simul Comput. 2022;51(4):1921-1931. doi: 10.1080/03610918.2019.1691223. Epub 2019 Nov 26.
7
MODEL-FREE FORWARD SCREENING VIA CUMULATIVE DIVERGENCE.
J Am Stat Assoc. 2020;115(531):1393-1405. doi: 10.1080/01621459.2019.1632078. Epub 2019 Jul 22.
8
Censored cumulative residual independent screening for ultrahigh-dimensional survival data.
Lifetime Data Anal. 2018 Apr;24(2):273-292. doi: 10.1007/s10985-017-9395-2. Epub 2017 May 26.
9
Model-free slice screening for ultrahigh-dimensional survival data.
J Appl Stat. 2020 Jun 2;48(10):1755-1774. doi: 10.1080/02664763.2020.1772734. eCollection 2021.
10
A selective overview of feature screening for ultrahigh-dimensional data.
Sci China Math. 2015 Oct;58(10):2033-2054. doi: 10.1007/s11425-015-5062-9. Epub 2015 Aug 22.

引用本文的文献

1
Extended fiducial inference: toward an automated process of statistical inference.
J R Stat Soc Series B Stat Methodol. 2024 Aug 5;87(1):98-131. doi: 10.1093/jrsssb/qkae082. eCollection 2025 Feb.
3
Markov Neighborhood Regression for High-Dimensional Inference.
J Am Stat Assoc. 2020;117(539):1200-1214. doi: 10.1080/01621459.2020.1841646. Epub 2020 Oct 28.
4
Bayesian Neural Networks for Selection of Drug Sensitive Genes.
J Am Stat Assoc. 2018;113(523):955-972. doi: 10.1080/01621459.2017.1409122. Epub 2018 Jun 28.
5
Drug sensitivity prediction with high-dimensional mixture regression.
PLoS One. 2019 Feb 27;14(2):e0212108. doi: 10.1371/journal.pone.0212108. eCollection 2019.

本文引用的文献

2
Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.
J Am Stat Assoc. 2015 Jun 1;110(510):630-641. doi: 10.1080/01621459.2014.920256.
3
Feature Screening via Distance Correlation Learning.
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
4
A meta-analysis approach for characterizing pan-cancer mechanisms of drug sensitivity in cell lines.
PLoS One. 2014 Jul 18;9(7):e103050. doi: 10.1371/journal.pone.0103050. eCollection 2014.
6
Putative DNA/RNA helicase Schlafen-11 (SLFN11) sensitizes cancer cells to DNA-damaging agents.
Proc Natl Acad Sci U S A. 2012 Sep 11;109(37):15030-5. doi: 10.1073/pnas.1205943109. Epub 2012 Aug 27.
7
Model-Free Feature Screening for Ultrahigh Dimensional Data.
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
8
9
Nonparametric Independence Screening in Sparse Ultra-High Dimensional Additive Models.
J Am Stat Assoc. 2011 Jun;106(494):544-557. doi: 10.1198/jasa.2011.tm09779.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验