Suppr超能文献

基于互信息的特征选择:最大依赖、最大相关和最小冗余准则。

Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.

作者信息

Peng Hanchuan, Long Fuhui, Ding Chris

机构信息

Lawrence Berkeley National Laboratory, University of California at Berkeley, 1 Cyclotron Road, MS. 84-171, Berkeley, CA 94720, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1226-38. doi: 10.1109/TPAMI.2005.159.

Abstract

Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

摘要

特征选择是模式分类系统中的一个重要问题。我们研究如何根据基于互信息的最大统计依赖准则来选择良好的特征。由于直接实现最大依赖条件存在困难,我们首先推导出一种等价形式,称为最小冗余最大相关准则(mRMR),用于一阶增量特征选择。然后,我们通过结合mRMR和其他更复杂的特征选择器(例如包装器)提出了一种两阶段特征选择算法。这使我们能够以非常低的成本选择一组紧凑的优质特征。我们使用三种不同的分类器(朴素贝叶斯、支持向量机和线性判别分析)和四个不同的数据集(手写数字、心律失常、NCI癌细胞系和淋巴瘤组织)对我们的算法和其他方法进行了广泛的实验比较。结果证实,mRMR在特征选择和分类准确率方面带来了有前景的改进。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验