• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种基于最优条件熵的高效并行特征选择算法。

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection.

作者信息

Estrela Gustavo, Gubitoso Marco Dimas, Ferreira Carlos Eduardo, Barrera Junior, Reis Marcelo S

机构信息

Center of Toxins, Immune-Response and Cell Signaling (CeTICS), Laboratório de Ciclo Celular, Instituto Butantan, Butantã, São Paulo-SP 05503-900, Brazil.

Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo-SP 05503-900, Brazil.

出版信息

Entropy (Basel). 2020 Apr 24;22(4):492. doi: 10.3390/e22040492.

DOI:10.3390/e22040492
PMID:33286261
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7516975/
Abstract

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.

摘要

在机器学习中,特征选择是分类器设计中的重要一步。它包括找到对于给定代价函数而言最优的特征子集。解决特征选择的一种可能性是将所有可能的特征子集组织成一个布尔格,并利用该格中链的代价描述U形曲线这一事实。最小化此类代价函数被称为U曲线问题。最近,一项研究提出了U曲线搜索(UCS),这是针对该问题的一种最优算法,它已成功用于特征选择。然而,尽管该算法具有最优性,但在计算分析中UCS所需的时间与特征数量呈指数关系。在此,我们报告这种可扩展性问题是由于U曲线问题是NP难问题这一事实导致的。接着,我们介绍并行U曲线搜索(PUCS),这是一种针对U曲线问题的新算法。在PUCS中,我们提出了一种将搜索空间划分为更小布尔格的新颖方法,从而使该算法具有高度可并行性。我们还提供了针对合成数据和机器学习数据集的计算分析,在这些分析中,将PUCS的性能与UCS以及特征选择中的其他黄金标准算法进行了评估。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/89d22ac8941a/entropy-22-00492-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/84591b86b911/entropy-22-00492-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/88ec285a20bc/entropy-22-00492-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/d02874af91a9/entropy-22-00492-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/9a0586effadb/entropy-22-00492-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/7a4eb10005fb/entropy-22-00492-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/89d22ac8941a/entropy-22-00492-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/84591b86b911/entropy-22-00492-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/88ec285a20bc/entropy-22-00492-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/d02874af91a9/entropy-22-00492-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/9a0586effadb/entropy-22-00492-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/7a4eb10005fb/entropy-22-00492-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f8cc/7516975/89d22ac8941a/entropy-22-00492-g006.jpg

相似文献

1
An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection.一种基于最优条件熵的高效并行特征选择算法。
Entropy (Basel). 2020 Apr 24;22(4):492. doi: 10.3390/e22040492.
2
An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset.一种用于多维数据集分类的高效特征子集选择算法。
ScientificWorldJournal. 2015;2015:821798. doi: 10.1155/2015/821798. Epub 2015 Sep 28.
3
An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data.基于基因表达数据的多支持向量机技术的高效特征选择策略。
Biomed Res Int. 2018 Aug 30;2018:7538204. doi: 10.1155/2018/7538204. eCollection 2018.
4
Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.在分类的特征选择问题中识别(准)等信息量子集:一种最大相关性最小冗余方法。
IEEE Trans Cybern. 2016 Jun;46(6):1424-37. doi: 10.1109/TCYB.2015.2444435. Epub 2015 Jul 6.
5
Adaptive feature selection using v-shaped binary particle swarm optimization.基于V形二进制粒子群优化算法的自适应特征选择
PLoS One. 2017 Mar 30;12(3):e0173907. doi: 10.1371/journal.pone.0173907. eCollection 2017.
6
A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。
J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.
7
A Wrapper Feature Subset Selection Method Based on Randomized Search and Multilayer Structure.基于随机搜索和多层结构的包装特征子集选择方法。
Biomed Res Int. 2019 Nov 4;2019:9864213. doi: 10.1155/2019/9864213. eCollection 2019.
8
An Efficient Hybrid Feature Selection Method Using the Artificial Immune Algorithm for High-Dimensional Data.基于人工免疫算法的高效混合特征选择方法在高维数据中的应用。
Comput Intell Neurosci. 2022 Oct 13;2022:1452301. doi: 10.1155/2022/1452301. eCollection 2022.
9
Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography.改进用于特征选择的曼-惠特尼统计检验:一种乳腺钼靶摄影乳腺癌诊断方法
Artif Intell Med. 2015 Jan;63(1):19-31. doi: 10.1016/j.artmed.2014.12.004. Epub 2014 Dec 12.
10
Feature selection based on dependency margin.基于依存距离的特征选择。
IEEE Trans Cybern. 2015 Jun;45(6):1209-21. doi: 10.1109/TCYB.2014.2347372. Epub 2014 Sep 26.

本文引用的文献

1
Time-course gait analysis of hemiparkinsonian rats following 6-hydroxydopamine lesion.6-羟多巴胺损毁致偏侧帕金森病大鼠的时程步态分析。
Behav Brain Res. 2011 Sep 12;222(1):1-9. doi: 10.1016/j.bbr.2011.03.031. Epub 2011 Mar 22.
2
Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.基于互信息的特征选择:最大依赖、最大相关和最小冗余准则。
IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1226-38. doi: 10.1109/TPAMI.2005.159.
3
Data mining in bioinformatics using Weka.使用Weka进行生物信息学中的数据挖掘。
Bioinformatics. 2004 Oct 12;20(15):2479-81. doi: 10.1093/bioinformatics/bth261. Epub 2004 Apr 8.
4
A meeting with Enrico Fermi.与恩里科·费米的一次会面。
Nature. 2004 Jan 22;427(6972):297. doi: 10.1038/427297a.
5
Multisurface method of pattern separation for medical diagnosis applied to breast cytology.用于医学诊断的模式分离多表面方法应用于乳腺细胞学
Proc Natl Acad Sci U S A. 1990 Dec;87(23):9193-6. doi: 10.1073/pnas.87.23.9193.