基于局部学习的高维数据分析特征选择。

Local-learning-based feature selection for high-dimensional data analysis.

机构信息

Interdisciplinary Center for Biotechnology Research, University of Florida, PO Box 103622, Gainesville, FL 32610-3622, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1610-26. doi: 10.1109/TPAMI.2009.190.

DOI:10.1109/TPAMI.2009.190

PMID:20634556

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3445441/

Abstract

This paper considers feature selection for data classification in the presence of a huge number of irrelevant features. We propose a new feature-selection algorithm that addresses several major issues with prior work, including problems with algorithm implementation, computational complexity, and solution accuracy. The key idea is to decompose an arbitrarily complex nonlinear problem into a set of locally linear ones through local learning, and then learn feature relevance globally within the large margin framework. The proposed algorithm is based on well-established machine learning and numerical analysis techniques, without making any assumptions about the underlying data distribution. It is capable of processing many thousands of features within minutes on a personal computer while maintaining a very high accuracy that is nearly insensitive to a growing number of irrelevant features. Theoretical analyses of the algorithm's sample complexity suggest that the algorithm has a logarithmical sample complexity with respect to the number of features. Experiments on 11 synthetic and real-world data sets demonstrate the viability of our formulation of the feature-selection problem for supervised learning and the effectiveness of our algorithm.

摘要

本文考虑在存在大量不相关特征的情况下对数据分类进行特征选择。我们提出了一种新的特征选择算法，解决了先前工作中的几个主要问题，包括算法实现、计算复杂度和解决方案准确性方面的问题。其关键思想是通过局部学习将任意复杂的非线性问题分解为一组局部线性问题，然后在大间隔框架内全局学习特征相关性。所提出的算法基于成熟的机器学习和数值分析技术，而不针对基础数据分布做出任何假设。它能够在个人计算机上在几分钟内处理数千个特征，同时保持极高的准确性，几乎不受不断增加的不相关特征的影响。对算法样本复杂度的理论分析表明，该算法的样本复杂度对数与特征数量呈对数关系。在 11 个合成和真实数据集上的实验证明了我们对监督学习中特征选择问题的表述的可行性和我们算法的有效性。

相似文献

Local-learning-based feature selection for high-dimensional data analysis.

IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1610-26. doi: 10.1109/TPAMI.2009.190.

Decision manifolds--a supervised learning algorithm based on self-organization.

IEEE Trans Neural Netw. 2008 Sep;19(9):1518-30. doi: 10.1109/TNN.2008.2000449.

The relevance sample-feature machine: a sparse Bayesian learning approach to joint feature-sample selection.

IEEE Trans Cybern. 2013 Dec;43(6):2241-54. doi: 10.1109/TCYB.2013.2260736.

SemiBoost: boosting for semi-supervised learning.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2000-14. doi: 10.1109/TPAMI.2008.235.

Wrapper-filter feature selection algorithm using a memetic framework.

IEEE Trans Syst Man Cybern B Cybern. 2007 Feb;37(1):70-6. doi: 10.1109/tsmcb.2006.883267.

Particle swarm optimization for feature selection in classification: a multi-objective approach.

IEEE Trans Cybern. 2013 Dec;43(6):1656-71. doi: 10.1109/TSMCB.2012.2227469.

Efficient model learning methods for actor-critic control.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):591-602. doi: 10.1109/TSMCB.2011.2170565. Epub 2011 Dec 7.

A new feature selection scheme using a data distribution factor for unsupervised nominal data.

IEEE Trans Syst Man Cybern B Cybern. 2008 Apr;38(2):499-509. doi: 10.1109/TSMCB.2007.914707.

Implementing online natural gradient learning: problems and solutions.

IEEE Trans Neural Netw. 2006 Mar;17(2):317-29. doi: 10.1109/TNN.2005.863406.

Simultaneous structure identification and fuzzy rule generation for Takagi-Sugeno models.

IEEE Trans Syst Man Cybern B Cybern. 2008 Dec;38(6):1626-38. doi: 10.1109/TSMCB.2008.2006367.

引用本文的文献

Prostate Cancer Progression Modeling Provides Insight into Dynamic Molecular Changes Associated with Progressive Disease States.

Cancer Res Commun. 2024 Oct 1;4(10):2783-2798. doi: 10.1158/2767-9764.CRC-24-0210.

Exploring the Role of Machine Learning in Diagnosing and Treating Speech Disorders: A Systematic Literature Review.

Psychol Res Behav Manag. 2024 May 31;17:2205-2232. doi: 10.2147/PRBM.S460283. eCollection 2024.

Time Series Data Prediction and Feature Analysis of Sports Dance Movements Based on Machine Learning.

Comput Intell Neurosci. 2022 Aug 24;2022:5611829. doi: 10.1155/2022/5611829. eCollection 2022.

Computational approach to modeling microbiome landscapes associated with chronic human disease progression.

PLoS Comput Biol. 2022 Aug 4;18(8):e1010373. doi: 10.1371/journal.pcbi.1010373. eCollection 2022 Aug.

Relevance, redundancy, and complementarity trade-off (RRCT): A principled, generic, robust feature-selection tool.

Patterns (N Y). 2022 Mar 31;3(5):100471. doi: 10.1016/j.patter.2022.100471. eCollection 2022 May 13.

An Adaptive Unsupervised Feature Selection Algorithm Based on MDS for Tumor Gene Data Classification.

Sensors (Basel). 2021 May 23;21(11):3627. doi: 10.3390/s21113627.

Clinical Utility of a Nomogram for Predicting 30-Days Poor Outcome in Hospitalized Patients With COVID-19: Multicenter External Validation and Decision Curve Analysis.

Front Med (Lausanne). 2020 Dec 23;7:590460. doi: 10.3389/fmed.2020.590460. eCollection 2020.

A Stroke Risk Detection: Improving Hybrid Feature Selection Method.

J Med Internet Res. 2019 Apr 2;21(4):e12437. doi: 10.2196/12437.

Talk2Me: Automated linguistic data collection for personal assessment.

PLoS One. 2019 Mar 27;14(3):e0212342. doi: 10.1371/journal.pone.0212342. eCollection 2019.

An Intelligent Parkinson's Disease Diagnostic System Based on a Chaotic Bacterial Foraging Optimization Enhanced Fuzzy KNN Approach.

Comput Math Methods Med. 2018 Jun 21;2018:2396952. doi: 10.1155/2018/2396952. eCollection 2018.

本文引用的文献

Approaches to dimensionality reduction in proteomic biomarker studies.

Brief Bioinform. 2008 Mar;9(2):102-18. doi: 10.1093/bib/bbn005. Epub 2008 Feb 29.

An overview of statistical learning theory.

IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.

Improved breast cancer prognosis through the combination of clinical and genetic markers.

Bioinformatics. 2007 Jan 1;23(1):30-7. doi: 10.1093/bioinformatics/btl543. Epub 2006 Nov 26.

Optimally sparse representation in general (nonorthogonal) dictionaries via l minimization.

Proc Natl Acad Sci U S A. 2003 Mar 4;100(5):2197-202. doi: 10.1073/pnas.0437847100. Epub 2003 Feb 21.

Integration of gene expression profiling and clinical variables to predict prostate carcinoma recurrence after radical prostatectomy.

Cancer. 2005 Jul 15;104(2):290-8. doi: 10.1002/cncr.21157.

Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.

Lancet. 2005;365(9460):671-9. doi: 10.1016/S0140-6736(05)17947-1.

The generalized LASSO.

IEEE Trans Neural Netw. 2004 Jan;15(1):16-28. doi: 10.1109/TNN.2003.809398.

Gene expression profiling predicts clinical outcome of breast cancer.

Nature. 2002 Jan 31;415(6871):530-6. doi: 10.1038/415530a.

Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning.

Nat Med. 2002 Jan;8(1):68-74. doi: 10.1038/nm0102-68.

Nonlinear dimensionality reduction by locally linear embedding.

Science. 2000 Dec 22;290(5500):2323-6. doi: 10.1126/science.290.5500.2323.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于局部学习的高维数据分析特征选择。

Local-learning-based feature selection for high-dimensional data analysis.

机构信息

Interdisciplinary Center for Biotechnology Research, University of Florida, PO Box 103622, Gainesville, FL 32610-3622, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2010 Sep;32(9):1610-26. doi: 10.1109/TPAMI.2009.190.

DOI:10.1109/TPAMI.2009.190

PMID:20634556

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3445441/

Abstract

摘要

基于局部学习的高维数据分析特征选择。

Local-learning-based feature selection for high-dimensional data analysis.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于局部学习的高维数据分析特征选择。

Local-learning-based feature selection for high-dimensional data analysis.

机构信息

出版信息