Suppr超能文献

超高维特征选择:超越线性模型

Ultrahigh dimensional feature selection: beyond the linear model.

作者信息

Fan Jianqing, Samworth Richard, Wu Yichao

机构信息

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08540 USA.

出版信息

J Mach Learn Res. 2009;10:2013-2038.

Abstract

Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan and Lv, 2008) or feature selection using a two-sample t-test in high-dimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan and Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classification where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology.

摘要

高维空间中的变量选择是科学发现和决策中许多当代问题的特征。许多常用技术都基于独立性筛选;例如相关排序(范剑青和吕毅,2008年)或在高维分类中使用两样本t检验进行特征选择(蒂布希拉尼等人,2003年)。在线性模型的背景下,范剑青和吕毅(2008年)表明,这种简单的相关排序在某些条件下具有确定的独立性筛选属性,并且当特征与响应变量边际无关但联合相关时,需要对其进行修正,即所谓的迭代确定独立筛选(ISIS)。在本文中,我们将ISIS扩展到一个一般的伪似然框架,该框架以广义线性模型为特例,且无需明确定义残差。即使在最小二乘设置中,新方法也通过允许在迭代过程中删除特征来改进ISIS。我们的技术使我们能够在高维分类中选择重要特征,而常用的两样本t方法在这种情况下会失效。本文引入了一种新技术来降低特征筛选阶段的错误选择率。给出了几个模拟示例和两个实际数据示例来说明该方法。

相似文献

2
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
4
A selective overview of feature screening for ultrahigh-dimensional data.超高维数据特征筛选的选择性概述。
Sci China Math. 2015 Oct;58(10):2033-2054. doi: 10.1007/s11425-015-5062-9. Epub 2015 Aug 22.
7
Feature screening for case-cohort studies with failure time outcome.具有生存时间结局的病例队列研究的特征筛选
Scand Stat Theory Appl. 2021 Mar;48(1):349-370. doi: 10.1111/sjos.12503. Epub 2020 Nov 16.

引用本文的文献

5
A Model-free Approach for Testing Association.一种用于检验关联性的无模型方法。
J R Stat Soc Ser C Appl Stat. 2021 Jun;70(3):511-531. doi: 10.1111/rssc.12467. Epub 2021 Jun 4.
9
A Model-free Variable Screening Method Based on Leverage Score.一种基于杠杆得分的无模型变量筛选方法。
J Am Stat Assoc. 2023;118(541):135-146. doi: 10.1080/01621459.2021.1918554. Epub 2021 Jun 21.

本文引用的文献

2
5
Statistical analysis of DNA microarray data in cancer research.癌症研究中DNA微阵列数据的统计分析。
Clin Cancer Res. 2006 Aug 1;12(15):4469-73. doi: 10.1158/1078-0432.CCR-06-1033.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验