Suppr超能文献

一种基于特征子集的新的改进的最大相关性和最小冗余方法。

A new improved maximal relevance and minimal redundancy method based on feature subset.

作者信息

Xie Shanshan, Zhang Yan, Lv Danjv, Chen Xu, Lu Jing, Liu Jiang

机构信息

College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650224 China.

College of Mathematics and Physics, Southwest Forestry University, Kunming, 650224 China.

出版信息

J Supercomput. 2023;79(3):3157-3180. doi: 10.1007/s11227-022-04763-2. Epub 2022 Aug 30.

Abstract

Feature selection plays a very significant role for the success of pattern recognition and data mining. Based on the maximal relevance and minimal redundancy (mRMR) method, combined with feature subset, this paper proposes an improved maximal relevance and minimal redundancy (ImRMR) feature selection method based on feature subset. In ImRMR, the Pearson correlation coefficient and mutual information are first used to measure the relevance of a single feature to the sample category, and a factor is introduced to adjust the weights of the two measurement criteria. And an equal grouping method is exploited to generate candidate feature subsets according to the ranking features. Then, the relevance and redundancy of candidate feature subsets are calculated and the ordered sequence of these feature subsets is gained by incremental search method. Finally, the final optimal feature subset is obtained from these feature subsets by combining the sequence forward search method and the classification learning algorithm. Experiments are conducted on seven datasets. The results show that ImRMR can effectively remove irrelevant and redundant features, which can not only reduce the dimension of sample features and time of model training and prediction, but also improve the classification performance.

摘要

特征选择对于模式识别和数据挖掘的成功起着非常重要的作用。基于最大相关性和最小冗余度(mRMR)方法,结合特征子集,本文提出了一种基于特征子集的改进型最大相关性和最小冗余度(ImRMR)特征选择方法。在ImRMR中,首先使用皮尔逊相关系数和互信息来度量单个特征与样本类别的相关性,并引入一个因子来调整这两个度量标准的权重。然后采用等分组方法根据排序后的特征生成候选特征子集。接着,计算候选特征子集的相关性和冗余度,并通过增量搜索方法获得这些特征子集的有序序列。最后,结合序列前向搜索方法和分类学习算法从这些特征子集中获得最终的最优特征子集。在七个数据集上进行了实验。结果表明,ImRMR能够有效地去除不相关和冗余的特征,不仅可以降低样本特征的维度以及模型训练和预测的时间,还能提高分类性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ce96/9424812/093c880d7ad7/11227_2022_4763_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验