Suppr超能文献

Fisher-Markov 选择器:用于多类分类的最大可分离特征子集的快速选择,应用于高维数据。

The Fisher-Markov selector: fast selecting maximally separable feature subset for multiclass classification with applications to high-dimensional data.

机构信息

Department of Computer Science, Faner Hall, Mailcode 4511, Southern Illinois University Carbondale, 1000 Faner Drive, Carbondale, IL 62901, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2011 Jun;33(6):1217-33. doi: 10.1109/TPAMI.2010.195.

Abstract

Selecting features for multiclass classification is a critically important task for pattern recognition and machine learning applications. Especially challenging is selecting an optimal subset of features from high-dimensional data, which typically have many more variables than observations and contain significant noise, missing components, or outliers. Existing methods either cannot handle high-dimensional data efficiently or scalably, or can only obtain local optimum instead of global optimum. Toward the selection of the globally optimal subset of features efficiently, we introduce a new selector--which we call the Fisher-Markov selector--to identify those features that are the most useful in describing essential differences among the possible groups. In particular, in this paper we present a way to represent essential discriminating characteristics together with the sparsity as an optimization objective. With properly identified measures for the sparseness and discriminativeness in possibly high-dimensional settings, we take a systematic approach for optimizing the measures to choose the best feature subset. We use Markov random field optimization techniques to solve the formulated objective functions for simultaneous feature selection. Our results are noncombinatorial, and they can achieve the exact global optimum of the objective function for some special kernels. The method is fast; in particular, it can be linear in the number of features and quadratic in the number of observations. We apply our procedure to a variety of real-world data, including mid--dimensional optical handwritten digit data set and high-dimensional microarray gene expression data sets. The effectiveness of our method is confirmed by experimental results. In pattern recognition and from a model selection viewpoint, our procedure says that it is possible to select the most discriminating subset of variables by solving a very simple unconstrained objective function which in fact can be obtained with an explicit expression.

摘要

多类分类的特征选择对于模式识别和机器学习应用来说是一项非常重要的任务。尤其具有挑战性的是从高维数据中选择最佳的特征子集,这些数据通常具有比观测值多得多的变量,并且包含大量的噪声、缺失成分或异常值。现有的方法要么不能有效地处理高维数据,要么不能可扩展地处理,要么只能获得局部最优,而不是全局最优。为了有效地选择全局最优的特征子集,我们引入了一种新的选择器——我们称之为 Fisher-Markov 选择器——来识别那些在描述可能的群体之间的基本差异方面最有用的特征。特别是,在本文中,我们提出了一种将基本区分特征与稀疏性表示为优化目标的方法。在可能的高维环境中,通过适当识别稀疏性和区分性的度量标准,我们采取系统的方法来优化这些度量标准,以选择最佳的特征子集。我们使用马尔可夫随机场优化技术来解决所提出的目标函数,以进行同时的特征选择。我们的结果是非组合的,并且对于某些特殊核,它们可以达到目标函数的精确全局最优。该方法速度很快;特别是,它在特征数量上是线性的,在观测数量上是二次的。我们将我们的方法应用于各种真实世界的数据,包括中维光学手写数字数据集和高维微阵列基因表达数据集。实验结果证实了我们方法的有效性。在模式识别和模型选择的角度来看,我们的过程表明通过求解一个非常简单的无约束目标函数,实际上可以通过显式表达式来选择最具区分性的变量子集。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验