Fisher-Markov 选择器：用于多类分类的最大可分离特征子集的快速选择，应用于高维数据。

The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

机构信息

Department of Computer Science, Faner Hall, Mailcode 4511, Southern Illinois University Carbondale, 1000 Faner Drive, Carbondale, IL 62901, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2011 Jun;33(6):1217-33. doi: 10.1109/TPAMI.2010.195.

DOI:10.1109/TPAMI.2010.195

PMID:21493968

Abstract

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

摘要

多类分类的特征选择对于模式识别和机器学习应用来说是一项非常重要的任务。尤其具有挑战性的是从高维数据中选择最佳的特征子集，这些数据通常具有比观测值多得多的变量，并且包含大量的噪声、缺失成分或异常值。现有的方法要么不能有效地处理高维数据，要么不能可扩展地处理，要么只能获得局部最优，而不是全局最优。为了有效地选择全局最优的特征子集，我们引入了一种新的选择器——我们称之为 Fisher-Markov 选择器——来识别那些在描述可能的群体之间的基本差异方面最有用的特征。特别是，在本文中，我们提出了一种将基本区分特征与稀疏性表示为优化目标的方法。在可能的高维环境中，通过适当识别稀疏性和区分性的度量标准，我们采取系统的方法来优化这些度量标准，以选择最佳的特征子集。我们使用马尔可夫随机场优化技术来解决所提出的目标函数，以进行同时的特征选择。我们的结果是非组合的，并且对于某些特殊核，它们可以达到目标函数的精确全局最优。该方法速度很快；特别是，它在特征数量上是线性的，在观测数量上是二次的。我们将我们的方法应用于各种真实世界的数据，包括中维光学手写数字数据集和高维微阵列基因表达数据集。实验结果证实了我们方法的有效性。在模式识别和模型选择的角度来看，我们的过程表明通过求解一个非常简单的无约束目标函数，实际上可以通过显式表达式来选择最具区分性的变量子集。

相似文献

The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.Fisher-Markov 选择器：用于多类分类的最大可分离特征子集的快速选择，应用于高维数据。

IEEE Trans Pattern Anal Mach Intell. 2011 Jun;33(6):1217-33. doi: 10.1109/TPAMI.2010.195.

Iterative RELIEF for feature weighting: algorithms, theories, and applications.用于特征加权的迭代RELIEF：算法、理论与应用

IEEE Trans Pattern Anal Mach Intell. 2007 Jun;29(6):1035-51. doi: 10.1109/TPAMI.2007.1093.

A novel feature selection approach for biomedical data classification.一种用于生物医学数据分类的新特征选择方法。

J Biomed Inform. 2010 Feb;43(1):15-23. doi: 10.1016/j.jbi.2009.07.008. Epub 2009 Jul 30.

Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns.结合多变量体素选择和支持向量机对功能磁共振成像空间模式进行映射和分类

Neuroimage. 2008 Oct 15;43(1):44-58. doi: 10.1016/j.neuroimage.2008.06.037. Epub 2008 Jul 11.

Capitalize on dimensionality increasing techniques for improving Face Recognition Grand Challenge performance.利用维度增加技术来提高人脸识别大挑战的性能。

IEEE Trans Pattern Anal Mach Intell. 2006 May;28(5):725-37. doi: 10.1109/TPAMI.2006.90.

COMPARE: classification of morphological patterns using adaptive regional elements.比较：使用自适应区域元素对形态模式进行分类。

IEEE Trans Med Imaging. 2007 Jan;26(1):93-105. doi: 10.1109/TMI.2006.886812.

Guilt-by-association feature selection: identifying biomarkers from proteomic profiles.基于关联的特征选择：从蛋白质组学图谱中识别生物标志物。

J Biomed Inform. 2008 Feb;41(1):124-36. doi: 10.1016/j.jbi.2007.04.003. Epub 2007 Apr 14.

BM3 E: discriminative density propagation for visual tracking.BM3 E：用于视觉跟踪的判别密度传播

IEEE Trans Pattern Anal Mach Intell. 2007 Nov;29(11):2030-44. doi: 10.1109/TPAMI.2007.1111.

MRF energy minimization and beyond via dual decomposition.通过对偶分解实现 MRF 能量最小化及超越。

IEEE Trans Pattern Anal Mach Intell. 2011 Mar;33(3):531-52. doi: 10.1109/TPAMI.2010.108.

Feature selection with kernel class separability.基于核类可分性的特征选择

IEEE Trans Pattern Anal Mach Intell. 2008 Sep;30(9):1534-46. doi: 10.1109/TPAMI.2007.70799.

引用本文的文献

Electromyogram in Cigarette Smoking Activity Recognition.吸烟活动识别中的肌电图

Signals (Basel). 2021 Mar;2(1):87-97. doi: 10.3390/signals2010008. Epub 2021 Feb 9.

Algorithmic Stability and Generalization of an Unsupervised Feature Selection Algorithm.一种无监督特征选择算法的算法稳定性与泛化能力

Adv Neural Inf Process Syst. 2021 Dec;34:19860-19875.

Fractal Autoencoders for Feature Selection.用于特征选择的分形自动编码器

Proc AAAI Conf Artif Intell. 2021 Feb;2021:10370-10378.

In silico prediction of HIV-1-host molecular interactions and their directionality.基于计算机的 HIV-1 宿主分子相互作用及其方向性的预测。

PLoS Comput Biol. 2022 Feb 8;18(2):e1009720. doi: 10.1371/journal.pcbi.1009720. eCollection 2022 Feb.

iBLP: An XGBoost-Based Predictor for Identifying Bioluminescent Proteins.iBLP：一种基于 XGBoost 的生物发光蛋白鉴定预测器。

Comput Math Methods Med. 2021 Jan 7;2021:6664362. doi: 10.1155/2021/6664362. eCollection 2021.

Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data.判别式岭机器：一种用于高维数据或不平衡数据的分类器。

IEEE Trans Neural Netw Learn Syst. 2021 Jun;32(6):2595-2609. doi: 10.1109/TNNLS.2020.3006877. Epub 2021 Jun 2.

Talk2Me: Automated linguistic data collection for personal assessment.Talk2Me：用于个人评估的自动化语言数据采集。

PLoS One. 2019 Mar 27;14(3):e0212342. doi: 10.1371/journal.pone.0212342. eCollection 2019.

Incorporating EBO-HSIC with SVM for Gene Selection Associated with Cervical Cancer Classification.将 EBO-HSIC 与 SVM 相结合，用于选择与宫颈癌分类相关的基因。

J Med Syst. 2018 Oct 6;42(11):225. doi: 10.1007/s10916-018-1092-5.

High-Throughput Identification of Mammalian Secreted Proteins Using Species-Specific Scheme and Application to Human Proteome.高通量鉴定哺乳动物分泌蛋白的物种特异性方案及其在人类蛋白质组中的应用。

Molecules. 2018 Jun 14;23(6):1448. doi: 10.3390/molecules23061448.

An IoT-Enabled Stroke Rehabilitation System Based on Smart Wearable Armband and Machine Learning.一种基于智能可穿戴臂带和机器学习的物联网中风康复系统。

IEEE J Transl Eng Health Med. 2018 May 8;6:2100510. doi: 10.1109/JTEHM.2018.2822681. eCollection 2018.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

Fisher-Markov 选择器：用于多类分类的最大可分离特征子集的快速选择，应用于高维数据。

The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献