Suppr超能文献

基于累积散度的无模型前向筛选

MODEL-FREE FORWARD SCREENING VIA CUMULATIVE DIVERGENCE.

作者信息

Zhou Tingyou, Zhu Liping, Xu Chen, Li Runze

机构信息

School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, P. R. China.

Institute of Statistics and Big Data and Center for Applied Statistics, Renmin University of China, Beijing, P. R. China.

出版信息

J Am Stat Assoc. 2020;115(531):1393-1405. doi: 10.1080/01621459.2019.1632078. Epub 2019 Jul 22.

Abstract

Feature screening plays an important role in the analysis of ultrahigh dimensional data. Due to complicated model structure and high noise level, existing screening methods often suffer from model misspecification and the presence of outliers. To address these issues, we introduce a new metric named cumulative divergence (CD), and develop a CD-based forward screening procedure. This forward screening method is model-free and resistant to the presence of outliers in the response. It also incorporates the joint effects among covariates into the screening process. With a data-driven threshold, the new method can automatically determine the number of features that should be retained after screening. These merits make the CD-based screening very appealing in practice. Under certain regularity conditions, we show that the proposed method possesses sure screening property. The performance of our proposal is illustrated through simulations and a real data example.

摘要

特征筛选在超高维数据的分析中起着重要作用。由于模型结构复杂且噪声水平高,现有的筛选方法常常受到模型误设和异常值存在的困扰。为了解决这些问题,我们引入了一种名为累积散度(CD)的新度量,并开发了一种基于CD的前向筛选程序。这种前向筛选方法是无模型的,并且对响应中的异常值具有抗性。它还将协变量之间的联合效应纳入筛选过程。通过一个数据驱动的阈值,新方法可以自动确定筛选后应保留的特征数量。这些优点使得基于CD的筛选在实际应用中非常有吸引力。在一定的正则性条件下,我们证明了所提出的方法具有确定筛选性质。通过模拟和一个实际数据例子说明了我们方法的性能。

相似文献

1
MODEL-FREE FORWARD SCREENING VIA CUMULATIVE DIVERGENCE.基于累积散度的无模型前向筛选
J Am Stat Assoc. 2020;115(531):1393-1405. doi: 10.1080/01621459.2019.1632078. Epub 2019 Jul 22.
4
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
6
Group Feature Screening via the F Statistic.通过F统计量进行组特征筛选。
Commun Stat Simul Comput. 2022;51(4):1921-1931. doi: 10.1080/03610918.2019.1691223. Epub 2019 Nov 26.

本文引用的文献

1
Variable screening via quantile partial correlation.通过分位数偏相关进行变量筛选。
J Am Stat Assoc. 2017;112(518):650-663. doi: 10.1080/01621459.2016.1156545. Epub 2017 Mar 30.
2
Conditional Sure Independence Screening.条件确定独立性筛选
J Am Stat Assoc. 2016;111(515):1266-1277. doi: 10.1080/01621459.2015.1092974. Epub 2016 Oct 18.
4
The Sparse MLE for Ultra-High-Dimensional Feature Screening.超高维特征筛选的稀疏极大似然估计
J Am Stat Assoc. 2014;109(507):1257-1269. doi: 10.1080/01621459.2013.879531.
5
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
7
Model-Free Feature Screening for Ultrahigh Dimensional Data.超高维数据的无模型特征筛选
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验