Liu Xiling, Zhou Shuisheng
School of Mathematics and Statistics, Xidian University, Xi'an 710071, China.
Public Education Department, Zhengzhou University of Economics and Business, Zhengzhou 451191, China.
Entropy (Basel). 2023 Feb 10;25(2):325. doi: 10.3390/e25020325.
Feature selection refers to a vital function in machine learning and data mining. The maximum weight minimum redundancy feature selection method not only considers the importance of features but also reduces the redundancy among features. However, the characteristics of various datasets are not identical, and thus the feature selection method should have different feature evaluation criteria for all datasets. Additionally, high-dimensional data analysis poses a challenge to enhancing the classification performance of the different feature selection methods. This study presents a kernel partial least squares feature selection method on the basis of the enhanced maximum weight minimum redundancy algorithm to simplify the calculation and improve the classification accuracy of high-dimensional datasets. By introducing a weight factor, the correlation between the maximum weight and the minimum redundancy in the evaluation criterion can be adjusted to develop an improved maximum weight minimum redundancy method. In this study, the proposed KPLS feature selection method considers the redundancy between the features and the feature weighting between any feature and a class label in different datasets. Moreover, the feature selection method proposed in this study has been tested regarding its classification accuracy on data containing noise and several datasets. The experimental findings achieved using different datasets explore the feasibility and effectiveness of the proposed method which can select an optimal feature subset and obtain great classification performance based on three different metrics when compared with other feature selection methods.
特征选择是机器学习和数据挖掘中的一项重要功能。最大权重最小冗余特征选择方法不仅考虑了特征的重要性,还减少了特征之间的冗余。然而,各种数据集的特征并不相同,因此特征选择方法应对所有数据集具有不同的特征评估标准。此外,高维数据分析对提高不同特征选择方法的分类性能提出了挑战。本研究基于增强的最大权重最小冗余算法提出了一种核偏最小二乘特征选择方法,以简化计算并提高高维数据集的分类精度。通过引入权重因子,可以调整评估标准中最大权重与最小冗余之间的相关性,从而开发出一种改进的最大权重最小冗余方法。在本研究中,所提出的KPLS特征选择方法考虑了不同数据集中特征之间的冗余以及任何特征与类标签之间的特征加权。此外,本研究中提出的特征选择方法已针对其在含噪声数据和多个数据集上的分类精度进行了测试。使用不同数据集获得的实验结果探索了所提出方法的可行性和有效性,与其他特征选择方法相比,该方法可以基于三种不同指标选择最优特征子集并获得出色的分类性能。