Xie Shanshan, Zhang Yan, Lv Danjv, Chen Xu, Lu Jing, Liu Jiang
College of Big Data and Intelligent Engineering, Southwest Forestry University, Kunming, 650224 China.
College of Mathematics and Physics, Southwest Forestry University, Kunming, 650224 China.
J Supercomput. 2023;79(3):3157-3180. doi: 10.1007/s11227-022-04763-2. Epub 2022 Aug 30.
Feature selection plays a very significant role for the success of pattern recognition and data mining. Based on the maximal relevance and minimal redundancy (mRMR) method, combined with feature subset, this paper proposes an improved maximal relevance and minimal redundancy (ImRMR) feature selection method based on feature subset. In ImRMR, the Pearson correlation coefficient and mutual information are first used to measure the relevance of a single feature to the sample category, and a factor is introduced to adjust the weights of the two measurement criteria. And an equal grouping method is exploited to generate candidate feature subsets according to the ranking features. Then, the relevance and redundancy of candidate feature subsets are calculated and the ordered sequence of these feature subsets is gained by incremental search method. Finally, the final optimal feature subset is obtained from these feature subsets by combining the sequence forward search method and the classification learning algorithm. Experiments are conducted on seven datasets. The results show that ImRMR can effectively remove irrelevant and redundant features, which can not only reduce the dimension of sample features and time of model training and prediction, but also improve the classification performance.
特征选择对于模式识别和数据挖掘的成功起着非常重要的作用。基于最大相关性和最小冗余度(mRMR)方法,结合特征子集,本文提出了一种基于特征子集的改进型最大相关性和最小冗余度(ImRMR)特征选择方法。在ImRMR中,首先使用皮尔逊相关系数和互信息来度量单个特征与样本类别的相关性,并引入一个因子来调整这两个度量标准的权重。然后采用等分组方法根据排序后的特征生成候选特征子集。接着,计算候选特征子集的相关性和冗余度,并通过增量搜索方法获得这些特征子集的有序序列。最后,结合序列前向搜索方法和分类学习算法从这些特征子集中获得最终的最优特征子集。在七个数据集上进行了实验。结果表明,ImRMR能够有效地去除不相关和冗余的特征,不仅可以降低样本特征的维度以及模型训练和预测的时间,还能提高分类性能。