Suppr超能文献

Deep-gKnock:基于深度神经网络的非线性群组特征选择。

Deep-gKnock: Nonlinear group-feature selection with deep neural networks.

机构信息

Department of Computer Science and Statistics, University of Rhode Island, United States of America.

Department of Electrical and Computer Engineering, Northeastern University, United States of America.

出版信息

Neural Netw. 2021 Mar;135:139-147. doi: 10.1016/j.neunet.2020.12.004. Epub 2020 Dec 14.

Abstract

Feature selection is central to contemporary high-dimensional data analysis. Group structure among features arises naturally in various scientific problems. Many methods have been proposed to incorporate the group structure information into feature selection. However, these methods are normally restricted to a linear regression setting. To relax the linear constraint, we design a new Deep Neural Network (DNN) architecture and integrating it with the recently proposed knockoff technique to perform nonlinear group-feature selection with controlled group-wise False Discovery Rate (gFDR). Experimental results on high-dimensional synthetic data demonstrate that our method achieves the highest power and accurate gFDR control compared with state-of-the-art methods. The performance of Deep-gKnock is especially superior in the following five situations: (1) nonlinearity relationship; (2) dimension p greater than sample size n; (3) high between-group correlation; (4) high within-group correlation; (5) large number of associated groups. And Deep-gKnock is also demonstrated to be robust to the misspecification of the feature distribution and the change of network architecture. Moreover, Deep-gKnock achieves scientifically meaningful group-feature selection results for cutting-edge real world datasets.

摘要

特征选择是当代高维数据分析的核心。特征之间的群组结构在各种科学问题中自然出现。已经提出了许多方法将群组结构信息纳入特征选择中。然而,这些方法通常限于线性回归设置。为了放宽线性约束,我们设计了一种新的深度神经网络(DNN)架构,并将其与最近提出的 knockoff 技术集成,以进行具有受控组误发现率(gFDR)的非线性群组特征选择。在高维合成数据上的实验结果表明,与最先进的方法相比,我们的方法在以下五个方面实现了最高的功效和准确的 gFDR 控制:(1)非线性关系;(2)维度 p 大于样本大小 n;(3)组间相关性高;(4)组内相关性高;(5)大量相关组。并且 Deep-gKnock 在特征分布的指定错误和网络架构的变化方面也表现出稳健性。此外,Deep-gKnock 为前沿的真实世界数据集实现了具有科学意义的群组特征选择结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验