Suppr超能文献

通过距离相关学习进行特征筛选

Feature Screening via Distance Correlation Learning.

作者信息

Li Runze, Zhong Wei, Zhu Liping

机构信息

The Pennsylvania State University, Xiamen University & Shanghai University of Finance and Economics.

出版信息

J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.

Abstract

This paper is concerned with screening features in ultrahigh dimensional data analysis, which has become increasingly important in diverse scientific fields. We develop a sure independence screening procedure based on the distance correlation (DC-SIS, for short). The DC-SIS can be implemented as easily as the sure independence screening procedure based on the Pearson correlation (SIS, for short) proposed by Fan and Lv (2008). However, the DC-SIS can significantly improve the SIS. Fan and Lv (2008) established the sure screening property for the SIS based on linear models, but the sure screening property is valid for the DC-SIS under more general settings including linear models. Furthermore, the implementation of the DC-SIS does not require model specification (e.g., linear model or generalized linear model) for responses or predictors. This is a very appealing property in ultrahigh dimensional data analysis. Moreover, the DC-SIS can be used directly to screen grouped predictor variables and for multivariate response variables. We establish the sure screening property for the DC-SIS, and conduct simulations to examine its finite sample performance. Numerical comparison indicates that the DC-SIS performs much better than the SIS in various models. We also illustrate the DC-SIS through a real data example.

摘要

本文关注超高维数据分析中的筛选特征,这在多个科学领域中变得越来越重要。我们基于距离相关系数开发了一种确定独立性筛选程序(简称为DC-SIS)。DC-SIS的实施与Fan和Lv(2008)提出的基于Pearson相关系数的确定独立性筛选程序(简称为SIS)一样容易。然而,DC-SIS能显著改进SIS。Fan和Lv(2008)基于线性模型建立了SIS的确定筛选性质,但在包括线性模型在内的更一般设定下,确定筛选性质对DC-SIS也成立。此外,DC-SIS的实施不需要对响应变量或预测变量进行模型设定(例如线性模型或广义线性模型)。这在超高维数据分析中是一个非常吸引人的性质。而且,DC-SIS可直接用于筛选分组预测变量以及处理多变量响应变量。我们建立了DC-SIS的确定筛选性质,并进行模拟以检验其有限样本性能。数值比较表明,在各种模型中DC-SIS的表现都比SIS好得多。我们还通过一个实际数据例子来说明DC-SIS。

相似文献

1
Feature Screening via Distance Correlation Learning.通过距离相关学习进行特征筛选
J Am Stat Assoc. 2012 Jul 1;107(499):1129-1139. doi: 10.1080/01621459.2012.695654.

引用本文的文献

6
Universally Consistent K-Sample Tests via Dependence Measures.通过依赖度量实现的通用一致K样本检验
Stat Probab Lett. 2025 Jan;216. doi: 10.1016/j.spl.2024.110278. Epub 2024 Sep 19.
10
A Model-free Approach for Testing Association.一种用于检验关联性的无模型方法。
J R Stat Soc Ser C Appl Stat. 2021 Jun;70(3):511-531. doi: 10.1111/rssc.12467. Epub 2021 Jun 4.

本文引用的文献

2
Model-Free Feature Screening for Ultrahigh Dimensional Data.超高维数据的无模型特征筛选
J Am Stat Assoc. 2011 Jan 1;106(496):1464-1475. doi: 10.1198/jasa.2011.tm10563. Epub 2012 Jan 24.
6
9

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验