Suppr超能文献

在分类的特征选择问题中识别(准)等信息量子集:一种最大相关性最小冗余方法。

Identifying (Quasi) Equally Informative Subsets in Feature Selection Problems for Classification: A Max-Relevance Min-Redundancy Approach.

出版信息

IEEE Trans Cybern. 2016 Jun;46(6):1424-37. doi: 10.1109/TCYB.2015.2444435. Epub 2015 Jul 6.

Abstract

An emerging trend in feature selection is the development of two-objective algorithms that analyze the tradeoff between the number of features and the classification performance of the model built with these features. Since these two objectives are conflicting, a typical result stands in a set of Pareto-efficient subsets, each having a different cardinality and a corresponding discriminating power. However, this approach overlooks the fact that, for a given cardinality, there can be several subsets with similar information content. The study reported here addresses this problem, and introduces a novel multiobjective feature selection approach conceived to identify: 1) a subset that maximizes the performance of a given classifier and 2) a set of subsets that are quasi equally informative, i.e., have almost same classification performance, to the performance maximizing subset. The approach consists of a wrapper [Wrapper for Quasi Equally Informative Subset Selection (W-QEISS)] built on the formulation of a four-objective optimization problem, which is aimed at maximizing the accuracy of a classifier, minimizing the number of features, and optimizing two entropy-based measures of relevance and redundancy. This allows conducting the search in a larger space, thus enabling the wrapper to generate a large number of Pareto-efficient solutions. The algorithm is compared against the mRMR algorithm, a two-objective wrapper and a computationally efficient filter [Filter for Quasi Equally Informative Subset Selection (F-QEISS)] on 24 University of California, Irvine, (UCI) datasets including both binary and multiclass classification. Experimental results show that W-QEISS has the capability of evolving a rich and diverse set of Pareto-efficient solutions, and that their availability helps in: 1) studying the tradeoff between multiple measures of classification performance and 2) understanding the relative importance of each feature. The quasi equally informative subsets are identified at the cost of a marginal increase in the computational time thanks to the adoption of Borg Multiobjective Evolutionary Algorithm and Extreme Learning Machine as global optimization and learning algorithms, respectively.

摘要

特征选择的一个新趋势是开发双目标算法,该算法分析了特征数量和使用这些特征构建的模型的分类性能之间的权衡。由于这两个目标是相互冲突的,典型的结果是一组帕累托有效的子集,每个子集的基数不同,相应的区分能力也不同。然而,这种方法忽略了一个事实,即对于给定的基数,可能有几个具有相似信息量的子集。本文研究了这个问题,并提出了一种新的多目标特征选择方法,旨在识别:1)最大化给定分类器性能的子集;2)一组准等信息量的子集,即具有几乎相同的分类性能,到性能最大化子集。该方法由一个包装器[用于准等信息量子集选择的包装器(W-QEISS)]组成,该包装器基于一个四目标优化问题的公式,旨在最大化分类器的准确性、最小化特征数量,并优化两个基于熵的相关性和冗余度度量。这允许在更大的空间中进行搜索,从而使包装器能够生成大量的帕累托有效解决方案。该算法与 mRMR 算法、一个双目标包装器和一个计算效率高的过滤器[用于准等信息量子集选择的过滤器(F-QEISS)]在 24 个加利福尼亚大学欧文分校(UCI)数据集上进行了比较,包括二进制和多类分类。实验结果表明,W-QEISS 具有进化出丰富多样的帕累托有效解决方案的能力,它们的可用性有助于:1)研究多个分类性能度量之间的权衡;2)了解每个特征的相对重要性。由于采用 Borg 多目标进化算法和极限学习机分别作为全局优化和学习算法,准等信息量子集的识别是以计算时间略有增加为代价的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验