• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

将贪婪特征选择算法扩展到多个解决方案。

Extending greedy feature selection algorithms to multiple solutions.

作者信息

Borboudakis Giorgos, Tsamardinos Ioannis

机构信息

University of Crete, Heraklion, Greece.

Gnosis Data Analysis (JADBio), Heraklion, Greece.

出版信息

Data Min Knowl Discov. 2021;35(4):1393-1434. doi: 10.1007/s10618-020-00731-7. Epub 2021 May 1.

DOI:10.1007/s10618-020-00731-7
PMID:34720675
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8550441/
Abstract

Most feature selection methods identify only a single solution. This is acceptable for predictive purposes, but is not sufficient for knowledge discovery if multiple solutions exist. We propose a strategy to extend a class of greedy methods to efficiently identify multiple solutions, and show under which conditions it identifies all solutions. We also introduce a taxonomy of features that takes the existence of multiple solutions into account. Furthermore, we explore different definitions of statistical equivalence of solutions, as well as methods for testing equivalence. A novel algorithm for compactly representing and visualizing multiple solutions is also introduced. In experiments we show that (a) the proposed algorithm is significantly more computationally efficient than the TIE* algorithm, the only alternative approach with similar theoretical guarantees, while identifying similar solutions to it, and (b) that the identified solutions have similar predictive performance.

摘要

大多数特征选择方法只能识别单一解决方案。这对于预测目的来说是可以接受的,但如果存在多个解决方案,对于知识发现而言就不够了。我们提出一种策略,将一类贪心方法进行扩展,以有效地识别多个解决方案,并展示在哪些条件下它能识别出所有解决方案。我们还引入了一种考虑到多个解决方案存在的特征分类法。此外,我们探讨了解决方案统计等价性的不同定义以及检验等价性的方法。还介绍了一种用于紧凑表示和可视化多个解决方案的新颖算法。在实验中我们表明:(a)所提出的算法在计算效率上显著高于TIE算法,TIE算法是唯一具有类似理论保证的替代方法,同时能识别出与它类似的解决方案;(b)所识别出的解决方案具有相似的预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/7acb05d0f19a/10618_2020_731_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/0812c26b0396/10618_2020_731_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/4219be572ece/10618_2020_731_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/f8e74891c2c2/10618_2020_731_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/ffff6f86d93e/10618_2020_731_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/9f0525df7bbc/10618_2020_731_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/57b8c5b8b20c/10618_2020_731_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/7acb05d0f19a/10618_2020_731_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/0812c26b0396/10618_2020_731_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/4219be572ece/10618_2020_731_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/f8e74891c2c2/10618_2020_731_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/ffff6f86d93e/10618_2020_731_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/9f0525df7bbc/10618_2020_731_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/57b8c5b8b20c/10618_2020_731_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8bcd/8550441/7acb05d0f19a/10618_2020_731_Fig7_HTML.jpg

相似文献

1
Extending greedy feature selection algorithms to multiple solutions.将贪婪特征选择算法扩展到多个解决方案。
Data Min Knowl Discov. 2021;35(4):1393-1434. doi: 10.1007/s10618-020-00731-7. Epub 2021 May 1.
2
Wrapper-based selection of genetic features in genome-wide association studies through fast matrix operations.通过快速矩阵运算在全基因组关联研究中基于包装法选择遗传特征。
Algorithms Mol Biol. 2012 May 2;7(1):11. doi: 10.1186/1748-7188-7-11.
3
Cost-Constrained feature selection in binary classification: adaptations for greedy forward selection and genetic algorithms.基于成本的二进制分类特征选择:贪婪前向选择和遗传算法的改进。
BMC Bioinformatics. 2020 Jan 28;21(1):26. doi: 10.1186/s12859-020-3361-9.
4
Feature selection based on neighborhood rough sets and Gini index.基于邻域粗糙集和基尼指数的特征选择
PeerJ Comput Sci. 2023 Dec 12;9:e1711. doi: 10.7717/peerj-cs.1711. eCollection 2023.
5
An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
6
Multiple feature selection based on an optimization strategy for causal analysis of health data.基于健康数据因果分析优化策略的多特征选择
Health Inf Sci Syst. 2024 Nov 12;12(1):52. doi: 10.1007/s13755-024-00312-8. eCollection 2024 Dec.
7
Efficient Minimum Cost Seed Selection With Theoretical Guarantees for Competitive Influence Maximization.高效最小成本种子选择,具有竞争影响力最大化的理论保证。
IEEE Trans Cybern. 2021 Dec;51(12):6091-6104. doi: 10.1109/TCYB.2020.2966593. Epub 2021 Dec 22.
8
Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution.基于广义狄利克雷分布的聚糖色谱数据的贪心特征选择。
BMC Bioinformatics. 2013 May 7;14:155. doi: 10.1186/1471-2105-14-155.
9
Input feature selection for classification problems.用于分类问题的输入特征选择。
IEEE Trans Neural Netw. 2002;13(1):143-59. doi: 10.1109/72.977291.
10
Novel Improved Salp Swarm Algorithm: An Application for Feature Selection.新型改进沙蚕群算法:在特征选择中的应用。
Sensors (Basel). 2022 Feb 22;22(5):1711. doi: 10.3390/s22051711.

引用本文的文献

1
A hybrid critical channels and optimal feature subset selection framework for EEG fatigue recognition.一种用于脑电图疲劳识别的混合关键通道与最优特征子集选择框架。
Sci Rep. 2025 Jan 16;15(1):2139. doi: 10.1038/s41598-025-86234-1.
2
The plasma miRNAome in ADNI: Signatures to aid the detection of at-risk individuals.ADNI 中的血浆 miRNA 组学:有助于发现高危个体的特征。
Alzheimers Dement. 2024 Nov;20(11):7479-7494. doi: 10.1002/alz.14157. Epub 2024 Sep 18.
3
A Snapshot-Stacked Ensemble and Optimization Approach for Vehicle Breakdown Prediction.

本文引用的文献

1
A greedy feature selection algorithm for Big Data of high dimensionality.一种用于高维大数据的贪心特征选择算法。
Mach Learn. 2019;108(2):149-202. doi: 10.1007/s10994-018-5748-7. Epub 2018 Aug 7.
2
A multi-marker association method for genome-wide association studies without the need for population structure correction.一种无需进行群体结构校正即可用于全基因组关联研究的多标记关联方法。
Nat Commun. 2016 Nov 10;7:13299. doi: 10.1038/ncomms13299.
3
Bridging a translational gap: using machine learning to improve the prediction of PTSD.
基于快照堆叠集成与优化的车辆故障预测方法。
Sensors (Basel). 2023 Jun 15;23(12):5621. doi: 10.3390/s23125621.
4
A Machine Learning Model to Predict Knee Osteoarthritis Cartilage Volume Changes over Time Using Baseline Bone Curvature.一种使用基线骨曲率预测膝关节骨关节炎软骨体积随时间变化的机器学习模型。
Biomedicines. 2022 May 26;10(6):1247. doi: 10.3390/biomedicines10061247.
5
Just Add Data: automated predictive modeling for knowledge discovery and feature selection.只需添加数据:用于知识发现和特征选择的自动预测建模
NPJ Precis Oncol. 2022 Jun 16;6(1):38. doi: 10.1038/s41698-022-00274-8.
6
Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning.2型糖尿病管理中的液体活检:通过机器学习构建特定生物标志物
J Clin Med. 2022 Feb 17;11(4):1045. doi: 10.3390/jcm11041045.
弥合转化差距:利用机器学习改善创伤后应激障碍的预测
BMC Psychiatry. 2015 Mar 16;15:30. doi: 10.1186/s12888-015-0399-8.
4
T-ReCS: stable selection of dynamically formed groups of features with application to prediction of clinical outcomes.T-ReCS:动态形成的特征组的稳定选择及其在临床结果预测中的应用
Pac Symp Biocomput. 2015;20:431-42.
5
Algorithms for Discovery of Multiple Markov Boundaries.用于发现多个马尔可夫边界的算法。
J Mach Learn Res. 2013 Feb;14:499-566.
6
2D image registration in CT images using radial image descriptors.使用径向图像描述符在CT图像中进行二维图像配准。
Med Image Comput Comput Assist Interv. 2011;14(Pt 2):607-14. doi: 10.1007/978-3-642-23629-7_74.
7
Analysis and computational dissection of molecular signature multiplicity.分析与计算剖析分子特征的多重性。
PLoS Comput Biol. 2010 May 20;6(5):e1000790. doi: 10.1371/journal.pcbi.1000790.
8
Ensemble gene selection by grouping for microarray data classification.基于分组的微阵列数据分类的集成基因选择。
J Biomed Inform. 2010 Feb;43(1):81-7. doi: 10.1016/j.jbi.2009.08.010. Epub 2009 Aug 20.
9
On the number of close-to-optimal feature sets.关于接近最优特征集的数量。
Cancer Inform. 2007 Feb 16;2:189-96.
10
Multiple robust signatures for detecting lymph node metastasis in head and neck cancer.用于检测头颈癌淋巴结转移的多种稳健特征
Cancer Res. 2006 Feb 15;66(4):2361-6. doi: 10.1158/0008-5472.CAN-05-3960.