Suppr超能文献

超高维数据的无模型条件独立特征筛选

Model-Free Conditional Independence Feature Screening For Ultrahigh Dimensional Data.

作者信息

Wang Luheng, Liu Jingyuan, Li Yong, Li Runze

机构信息

School of Mathematics, Beijing Normal University, Beijing 100875, P.R. China.

Department of Statistics, School of Economics, Wang Yanan Institute for Studies in Economics and Fujian Key Laboratory of Statistical Science, Xiamen University, Xiamen 361005, China.

出版信息

Sci China Math. 2017 Mar;60(3):551-568. doi: 10.1007/s11425-016-0186-8. Epub 2016 Dec 29.

Abstract

Feature screening plays an important role in ultrahigh dimensional data analysis. This paper is concerned with conditional feature screening when one is interested in detecting the association between the response and ultrahigh dimensional predictors (e.g., genetic makers) given a low-dimensional exposure variable (such as clinical variables or environmental variables). To this end, we first propose a new index to measure conditional independence, and further develop a conditional screening procedure based on the newly proposed index. We systematically study the theoretical property of the proposed procedure and establish the sure screening and ranking consistency properties under some very mild conditions. The newly proposed screening procedure enjoys some appealing properties. (a) It is model-free in that its implementation does not require a specification on the model structure; (b) it is robust to heavy-tailed distributions or outliers in both directions of response and predictors; and (c) it can deal with both feature screening and the conditional screening in a unified way. We study the finite sample performance of the proposed procedure by Monte Carlo simulations and further illustrate the proposed method through two real data examples.

摘要

特征筛选在超高维数据分析中起着重要作用。本文关注的是当人们想要检测响应变量与超高维预测变量(如基因标记)之间在给定低维暴露变量(如临床变量或环境变量)情况下的关联时的条件特征筛选。为此,我们首先提出一种新的指标来衡量条件独立性,并基于新提出的指标进一步开发一种条件筛选程序。我们系统地研究了所提出程序的理论性质,并在一些非常温和的条件下建立了确定筛选和排序一致性性质。新提出的筛选程序具有一些吸引人的性质。(a)它是无模型的,因为其实现不需要指定模型结构;(b)它对响应变量和预测变量两个方向上的重尾分布或异常值具有鲁棒性;(c)它可以以统一的方式处理特征筛选和条件筛选。我们通过蒙特卡罗模拟研究了所提出程序的有限样本性能,并通过两个实际数据例子进一步说明了所提出的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d1c0/5480220/3e92d6f3c24c/nihms866300f1.jpg

相似文献

5
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
6
Feature screening in ultrahigh-dimensional additive Cox model.超高维加法Cox模型中的特征筛选
J Stat Comput Simul. 2018;88(6):1117-1133. doi: 10.1080/00949655.2017.1422127. Epub 2018 Jan 8.

本文引用的文献

1
A selective overview of feature screening for ultrahigh-dimensional data.超高维数据特征筛选的选择性概述。
Sci China Math. 2015 Oct;58(10):2033-2054. doi: 10.1007/s11425-015-5062-9. Epub 2015 Aug 22.
3
The Sparse MLE for Ultra-High-Dimensional Feature Screening.超高维特征筛选的稀疏极大似然估计
J Am Stat Assoc. 2014;109(507):1257-1269. doi: 10.1080/01621459.2013.879531.
5
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.
7
Model-Free Feature Screening for Ultrahigh Dimensional Data.超高维数据的无模型特征筛选
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
10

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验