Suppr超能文献

条件确定独立性筛选

Conditional Sure Independence Screening.

作者信息

Barut Emre, Fan Jianqing, Verhasselt Anneleen

机构信息

Department of Statistics, George Washington University, Washington, DC 20052, USA.

Department of Operations Research & Financial Engineering, Princeton University, Princeton, NJ 08544, USA and special-term professor, School of Big Data, Fudan University, Shanghai, China.

出版信息

J Am Stat Assoc. 2016;111(515):1266-1277. doi: 10.1080/01621459.2015.1092974. Epub 2016 Oct 18.

Abstract

Independence screening is powerful for variable selection when the number of variables is massive. Commonly used independence screening methods are based on marginal correlations or its variants. When some prior knowledge on a certain important set of variables is available, a natural assessment on the relative importance of the other predictors is their conditional contributions to the response given the known set of variables. This results in conditional sure independence screening (CSIS). CSIS produces a rich family of alternative screening methods by different choices of the conditioning set and can help reduce the number of false positive and false negative selections when covariates are highly correlated. This paper proposes and studies CSIS in generalized linear models. We give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency and the properties of CSIS when a data-driven conditioning set is used. Moreover, we provide two data-driven methods to select the thresholding parameter of conditional screening. The utility of the procedure is illustrated by simulation studies and analysis of two real datasets.

摘要

当变量数量众多时,独立性筛选在变量选择方面具有强大作用。常用的独立性筛选方法基于边际相关性或其变体。当关于某一重要变量集的一些先验知识可用时,对其他预测变量相对重要性的一种自然评估是它们在给定已知变量集的情况下对响应的条件贡献。这就产生了条件确定独立性筛选(CSIS)。通过对条件集的不同选择,CSIS产生了一系列丰富的替代筛选方法,并且当协变量高度相关时,有助于减少误选和漏选的数量。本文在广义线性模型中提出并研究了CSIS。我们给出了能够进行确定筛选的条件,并推导了所选变量数量的上限。我们还详细说明了CSIS产生模型选择一致性的情况以及使用数据驱动的条件集时CSIS的性质。此外,我们提供了两种数据驱动的方法来选择条件筛选的阈值参数。通过模拟研究和对两个真实数据集的分析说明了该方法的实用性。

相似文献

1
Conditional Sure Independence Screening.条件确定独立性筛选
J Am Stat Assoc. 2016;111(515):1266-1277. doi: 10.1080/01621459.2015.1092974. Epub 2016 Oct 18.
9
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
10
Variable screening via quantile partial correlation.通过分位数偏相关进行变量筛选。
J Am Stat Assoc. 2017;112(518):650-663. doi: 10.1080/01621459.2016.1156545. Epub 2017 Mar 30.

引用本文的文献

2
Joint Screening for Ultra-High Dimensional Multi-Omics Data.超高维多组学数据的联合筛选
Bioengineering (Basel). 2024 Nov 25;11(12):1193. doi: 10.3390/bioengineering11121193.
4
Are Latent Factor Regression and Sparse Regression Adequate?潜在因子回归和稀疏回归是否足够?
J Am Stat Assoc. 2024;119(546):1076-1088. doi: 10.1080/01621459.2023.2169700. Epub 2023 Feb 14.
5
Image response regression via deep neural networks.通过深度神经网络进行图像响应回归
J R Stat Soc Series B Stat Methodol. 2023 Nov;85(5):1589-1614. doi: 10.1093/jrsssb/qkad073. Epub 2023 Jul 24.
6
Quantile forward regression for high-dimensional survival data.高维生存数据的分位数向前回归
Lifetime Data Anal. 2023 Oct;29(4):769-806. doi: 10.1007/s10985-023-09603-w. Epub 2023 Jul 2.

本文引用的文献

1
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
2
Model-Free Feature Screening for Ultrahigh Dimensional Data.超高维数据的无模型特征筛选
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
4
Non-Concave Penalized Likelihood with NP-Dimensionality.具有NP维数的非凹惩罚似然法
IEEE Trans Inf Theory. 2011 Aug;57(8):5467-5484. doi: 10.1109/TIT.2011.2158486.
7

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验