超高维数据的无模型特征筛选

Model-Free Feature Screening for Ultrahigh Dimensional Data.

作者信息

Zhu Liping, Li Lexin, Li Runze, Zhu Lixing

出版信息

J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.

DOI:10.1198/jasa.2011.tm10563

PMID:22754050

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3384506/

Abstract

With the recent explosion of scientific data of unprecedented size and complexity, feature ranking and screening are playing an increasingly important role in many scientific studies. In this article, we propose a novel feature screening procedure under a unified model framework, which covers a wide variety of commonly used parametric and semiparametric models. The new method does not require imposing a specific model structure on regression functions, and thus is particularly appealing to ultrahigh-dimensional regressions, where there are a huge number of candidate predictors but little information about the actual model forms. We demonstrate that, with the number of predictors growing at an exponential rate of the sample size, the proposed procedure possesses consistency in ranking, which is both useful in its own right and can lead to consistency in selection. The new procedure is computationally efficient and simple, and exhibits a competent empirical performance in our intensive simulations and real data analysis.

摘要

随着近期规模空前且复杂度极高的科学数据激增，特征排序与筛选在许多科学研究中发挥着越来越重要的作用。在本文中，我们在一个统一的模型框架下提出了一种新颖的特征筛选方法，该框架涵盖了多种常用的参数模型和半参数模型。新方法无需对回归函数施加特定的模型结构，因此对于超高维回归特别有吸引力，在超高维回归中存在大量候选预测变量，但关于实际模型形式的信息却很少。我们证明，随着预测变量数量以样本量的指数速率增长，所提出的方法在排序上具有一致性，这本身就很有用，并且能带来选择上的一致性。新方法计算效率高且简单，在我们密集的模拟和实际数据分析中展现出了出色的实证性能。

相似文献

Model-Free Feature Screening for Ultrahigh Dimensional Data.超高维数据的无模型特征筛选

J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.

Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.超高维数据的无模型条件独立特征筛选

Sci China Math. 2017 Mar;60(3):551-568. doi: 10.1007/s11425-016-0186-8. Epub 2016 Dec 29.

Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.超高维判别分析的无模型特征筛选

J Am Stat Assoc. 2015 Jun 1;110(510):630-641. doi: 10.1080/01621459.2014.920256.

Feature Screening in Ultrahigh Dimensional Cox's Model.超高维Cox模型中的特征筛选

Stat Sin. 2016;26:881-901. doi: 10.5705/ss.2014.171.

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates.具有超高维协变量的变系数模型的特征选择

J Am Stat Assoc. 2014 Jan 1;109(505):266-274. doi: 10.1080/01621459.2013.850086.

Group Feature Screening via the F Statistic.通过F统计量进行组特征筛选。

Commun Stat Simul Comput. 2022;51(4):1921-1931. doi: 10.1080/03610918.2019.1691223. Epub 2019 Nov 26.

Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.超高维分类数据的特征筛选及其应用

J Bus Econ Stat. 2014;32(2):237-244. doi: 10.1080/07350015.2013.863158.

Feature screening in ultrahigh-dimensional additive Cox model.超高维加法Cox模型中的特征筛选

J Stat Comput Simul. 2018;88(6):1117-1133. doi: 10.1080/00949655.2017.1422127. Epub 2018 Jan 8.

Feature Screening in Ultrahigh Dimensional Generalized Varying-coefficient Models.超高维广义变系数模型中的特征筛选

Stat Sin. 2020;30:1049-1067. doi: 10.5705/ss.202017.0362.

Feature screening in ultrahigh-dimensional varying-coefficient Cox model.超高维变系数Cox模型中的特征筛选

J Multivar Anal. 2019 May;171:284-297. doi: 10.1016/j.jmva.2018.12.009. Epub 2018 Dec 28.

引用本文的文献

Feature Ranking on Small Samples: A Bayes-Based Approach.小样本特征排序：一种基于贝叶斯的方法。

Entropy (Basel). 2025 Jul 22;27(8):773. doi: 10.3390/e27080773.

Detection of LUAD-Associated Genes Using Wasserstein Distance in Multiomics Feature Selection.在多组学特征选择中使用 Wasserstein 距离检测肺腺癌相关基因

Bioengineering (Basel). 2025 Jun 25;12(7):694. doi: 10.3390/bioengineering12070694.

-KIDS: A Novel Feature Evaluation in the Ultrahigh-Dimensional Right-Censored Setting, With Application to Head and Neck Cancer.-KIDS：超高维删失数据中的一种新型特征评估方法及其在头颈癌中的应用

Stat Med. 2025 Jul;44(15-17):e70167. doi: 10.1002/sim.70167.

Uncertainty Quantification in Epigenetic Clocks via Conformalized Quantile Regression.通过共形分位数回归进行表观遗传时钟中的不确定性量化

Genet Epidemiol. 2025 Jun;49(4):e70008. doi: 10.1002/gepi.70008.

Optimizing Prognostic Predictions in Liver Cancer with Machine Learning and Survival Analysis.利用机器学习和生存分析优化肝癌的预后预测

Entropy (Basel). 2024 Sep 7;26(9):767. doi: 10.3390/e26090767.

Uncertainty quantification in epigenetic clocks via conformalized quantile regression.通过共形分位数回归进行表观遗传时钟中的不确定性量化。

medRxiv. 2025 Feb 11:2024.09.06.24313192. doi: 10.1101/2024.09.06.24313192.

Are Latent Factor Regression and Sparse Regression Adequate?潜在因子回归和稀疏回归是否足够？

J Am Stat Assoc. 2024;119(546):1076-1088. doi: 10.1080/01621459.2023.2169700. Epub 2023 Feb 14.

-KIDS: A novel feature evaluation in the ultrahigh-dimensional right-censored setting, with application to Head and Neck Cancer.-KIDS：超高维右删失数据中的一种新型特征评估方法及其在头颈癌中的应用

medRxiv. 2024 Aug 14:2024.08.13.24311946. doi: 10.1101/2024.08.13.24311946.

Text-mining-based feature selection for anticancer drug response prediction.基于文本挖掘的特征选择用于抗癌药物反应预测。

Bioinform Adv. 2024 Mar 26;4(1):vbae047. doi: 10.1093/bioadv/vbae047. eCollection 2024.

Robust Alternatives to ANCOVA for Estimating the Treatment Effect via a Randomized Comparative Study.通过随机对照研究估计治疗效果的协方差分析稳健替代方法

J Am Stat Assoc. 2019;114(528):1854-1864. doi: 10.1080/01621459.2018.1527226. Epub 2019 Mar 18.

本文引用的文献

Ultrahigh dimensional feature selection: beyond the linear model.超高维特征选择：超越线性模型

J Mach Learn Res. 2009;10:2013-2038.

A Selective Overview of Variable Selection in High Dimensional Feature Space.高维特征空间中变量选择的选择性概述

Stat Sin. 2010 Jan;20(1):101-148.

Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.《超高维特征空间中的确定独立性筛选》讨论

J R Stat Soc Series B Stat Methodol. 2008 Nov;70(5):903. doi: 10.1111/j.1467-9868.2008.00674.x.

Boosting method for nonlinear transformation models with censored survival data.用于删失生存数据的非线性变换模型的提升方法。

Biostatistics. 2008 Oct;9(4):658-67. doi: 10.1093/biostatistics/kxn005. Epub 2008 Mar 15.

Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data.使用平滑样条增强比例风险模型及其在高维微阵列数据中的应用

Bioinformatics. 2005 May 15;21(10):2403-9. doi: 10.1093/bioinformatics/bti324. Epub 2005 Feb 15.

Dimension reduction methods for microarrays with application to censored survival data.用于微阵列的降维方法及其在删失生存数据中的应用。

Bioinformatics. 2004 Dec 12;20(18):3406-12. doi: 10.1093/bioinformatics/bth415. Epub 2004 Jul 15.

The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma.利用分子谱分析预测弥漫性大B细胞淋巴瘤化疗后的生存率。

N Engl J Med. 2002 Jun 20;346(25):1937-47. doi: 10.1056/NEJMoa012914.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验