• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于检测非线性信号的神经网络特征选择方法的定量基准。

A quantitative benchmark of neural network feature selection methods for detecting nonlinear signals.

作者信息

Passemiers Antoine, Folco Pietro, Raimondi Daniele, Birolo Giovanni, Moreau Yves, Fariselli Piero

机构信息

ESAT-STADIUS, KU Leuven, Leuven, Belgium.

Department of Medical Sciences, University of Torino, Torino, Italy.

出版信息

Sci Rep. 2024 Dec 28;14(1):31180. doi: 10.1038/s41598-024-82583-5.

DOI:10.1038/s41598-024-82583-5
PMID:39732866
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11682240/
Abstract

Classification and regression problems can be challenging when the relevant input features are diluted in noisy datasets, in particular when the sample size is limited. Traditional Feature Selection (FS) methods address this issue by relying on some assumptions such as the linear or additive relationship between features. Recently, a proliferation of Deep Learning (DL) models has emerged to tackle both FS and prediction at the same time, allowing non-linear modeling of the selected features. In this study, we systematically assess the performance of DL-based feature selection methods on synthetic datasets of varying complexity, and benchmark their efficacy in uncovering non-linear relationships between features. We also use the same settings to benchmark the reliability of gradient-based feature attribution techniques for Neural Networks (NNs), such as Saliency Maps (SM). A quantitative evaluation of the reliability of these approaches is currently missing. Our analysis indicates that even simple synthetic datasets can significantly challenge most of the DL-based FS and SM methods, while Random Forests, TreeShap, mRMR and LassoNet are the best performing FS methods. Our conclusion is that when quantifying the relevance of a few non linearly-entangled predictive features diluted in a large number of irrelevant noisy variables, DL-based FS and SM interpretation methods are still far from being reliable.

摘要

当相关输入特征在噪声数据集中被稀释时,尤其是样本量有限的情况下,分类和回归问题可能具有挑战性。传统的特征选择(FS)方法通过依赖一些假设(如特征之间的线性或加性关系)来解决这个问题。最近,涌现出大量深度学习(DL)模型,旨在同时处理特征选择和预测,从而能够对所选特征进行非线性建模。在本研究中,我们系统地评估了基于深度学习的特征选择方法在不同复杂程度的合成数据集上的性能,并对它们在揭示特征之间非线性关系方面的有效性进行了基准测试。我们还使用相同的设置来评估基于梯度的神经网络(NN)特征归因技术(如显著性图(SM))的可靠性。目前缺少对这些方法可靠性的定量评估。我们的分析表明,即使是简单的合成数据集也能对大多数基于深度学习的FS和SM方法构成重大挑战,而随机森林、TreeShap、mRMR和LassoNet是性能最佳的FS方法。我们的结论是,在量化少数在大量无关噪声变量中被稀释的非线性纠缠预测特征的相关性时,基于深度学习的FS和SM解释方法仍远不可靠。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d5454b6291a7/41598_2024_82583_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/80d1a3914f47/41598_2024_82583_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d26e38e29caf/41598_2024_82583_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d25b0e58358e/41598_2024_82583_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/e79e63b4f6ac/41598_2024_82583_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/23596dfd116b/41598_2024_82583_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/a3d6ca38d84c/41598_2024_82583_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d5454b6291a7/41598_2024_82583_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/80d1a3914f47/41598_2024_82583_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d26e38e29caf/41598_2024_82583_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d25b0e58358e/41598_2024_82583_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/e79e63b4f6ac/41598_2024_82583_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/23596dfd116b/41598_2024_82583_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/a3d6ca38d84c/41598_2024_82583_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/24ac/11682240/d5454b6291a7/41598_2024_82583_Fig7_HTML.jpg

相似文献

1
A quantitative benchmark of neural network feature selection methods for detecting nonlinear signals.用于检测非线性信号的神经网络特征选择方法的定量基准。
Sci Rep. 2024 Dec 28;14(1):31180. doi: 10.1038/s41598-024-82583-5.
2
Random KNN feature selection - a fast and stable alternative to Random Forests.随机近邻特征选择 - 一种比随机森林更快更稳定的替代方法。
BMC Bioinformatics. 2011 Nov 18;12:450. doi: 10.1186/1471-2105-12-450.
3
Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.使用混合特征选择方法和深度学习架构增强从基因表达谱预测浸润性导管癌乳腺癌分期的能力。
Med Biol Eng Comput. 2023 Nov;61(11):2895-2919. doi: 10.1007/s11517-023-02892-1. Epub 2023 Aug 2.
4
LassoNet: Neural Networks with Feature Sparsity.套索网络:具有特征稀疏性的神经网络。
Proc Mach Learn Res. 2021 Apr;130:10-18.
5
Combining handcrafted features with latent variables in machine learning for prediction of radiation-induced lung damage.将机器学习中的手工特征与潜在变量相结合,以预测放射性肺损伤。
Med Phys. 2019 May;46(5):2497-2511. doi: 10.1002/mp.13497. Epub 2019 Apr 8.
6
Feature Selection Stability and Accuracy of Prediction Models for Genomic Prediction of Residual Feed Intake in Pigs Using Machine Learning.使用机器学习对猪的剩余采食量进行基因组预测的预测模型的特征选择稳定性和准确性
Front Genet. 2021 Feb 22;12:611506. doi: 10.3389/fgene.2021.611506. eCollection 2021.
7
An MLP-based feature subset selection for HIV-1 protease cleavage site analysis.基于 MLP 的 HIV-1 蛋白酶切割位点分析特征子集选择。
Artif Intell Med. 2010 Feb-Mar;48(2-3):83-9. doi: 10.1016/j.artmed.2009.07.010. Epub 2009 Nov 27.
8
Relevance, redundancy, and complementarity trade-off (RRCT): A principled, generic, robust feature-selection tool.相关性、冗余性和互补性权衡(RRCT):一种有原则的、通用的、强大的特征选择工具。
Patterns (N Y). 2022 Mar 31;3(5):100471. doi: 10.1016/j.patter.2022.100471. eCollection 2022 May 13.
9
Benchmark study of feature selection strategies for multi-omics data.基于多组学数据的特征选择策略基准研究。
BMC Bioinformatics. 2022 Oct 5;23(1):412. doi: 10.1186/s12859-022-04962-x.
10
On the Stability and Homogeneous Ensemble of Feature Selection for Predictive Maintenance: A Classification Application for Tool Condition Monitoring in Milling.基于预测性维护的特征选择稳定性和同质性集成:铣削中刀具状况监测的分类应用
Sensors (Basel). 2023 May 3;23(9):4461. doi: 10.3390/s23094461.

引用本文的文献

1
Harnessing Machine Learning, a Subset of Artificial Intelligence, for Early Detection and Diagnosis of Type 1 Diabetes: A Systematic Review.利用机器学习(人工智能的一个子集)进行1型糖尿病的早期检测与诊断:一项系统评价
Int J Mol Sci. 2025 Apr 22;26(9):3935. doi: 10.3390/ijms26093935.
2
Mapping Cell Identity from scRNA-seq: A primer on computational methods.从单细胞RNA测序映射细胞身份:计算方法入门
Comput Struct Biotechnol J. 2025 Apr 2;27:1559-1569. doi: 10.1016/j.csbj.2025.03.051. eCollection 2025.

本文引用的文献

1
MarkerMap: nonlinear marker selection for single-cell studies.MarkerMap:单细胞研究中的非线性标记选择。
NPJ Syst Biol Appl. 2024 Feb 14;10(1):17. doi: 10.1038/s41540-024-00339-3.
2
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants.具有可控变量选择的深度神经网络用于识别假定的因果基因变异。
Nat Mach Intell. 2022 Sep;4(9):761-771. doi: 10.1038/s42256-022-00525-0. Epub 2022 Sep 15.
3
Large sample size and nonlinear sparse models outline epistatic effects in inflammatory bowel disease.
大样本量和非线性稀疏模型概述了炎症性肠病中的上位效应。
Genome Biol. 2023 Oct 5;24(1):224. doi: 10.1186/s13059-023-03064-y.
4
Deep neural networks with knockoff features identify nonlinear causal relations and estimate effect sizes in complex biological systems.带有 knockoff 特征的深度神经网络可识别复杂生物系统中的非线性因果关系并估计效应大小。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad044. Epub 2023 Jul 3.
5
ChatGPT: five priorities for research.ChatGPT:研究的五个优先事项。
Nature. 2023 Feb;614(7947):224-226. doi: 10.1038/d41586-023-00288-7.
6
LassoNet: Neural Networks with Feature Sparsity.套索网络:具有特征稀疏性的神经网络。
Proc Mach Learn Res. 2021 Apr;130:10-18.
7
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
8
Using machine learning approaches for multi-omics data analysis: A review.使用机器学习方法进行多组学数据分析:综述
Biotechnol Adv. 2021 Jul-Aug;49:107739. doi: 10.1016/j.biotechadv.2021.107739. Epub 2021 Mar 29.
9
An interpretable low-complexity machine learning framework for robust exome-based - diagnosis of Crohn's disease patients.一种用于基于外显子组的克罗恩病患者稳健诊断的可解释低复杂度机器学习框架。
NAR Genom Bioinform. 2020 Feb 21;2(1):lqaa011. doi: 10.1093/nargab/lqaa011. eCollection 2020 Mar.
10
Investigating the relevance of major signaling pathways in cancer survival using a biologically meaningful deep learning model.利用具有生物学意义的深度学习模型探究癌症存活相关的主要信号通路。
BMC Bioinformatics. 2021 Feb 5;22(1):47. doi: 10.1186/s12859-020-03850-6.