School of Computer Engineering, Jiangsu University of Technology, Jiangsu, Changzhou 213001, China.
Comput Intell Neurosci. 2021 Dec 28;2021:3569632. doi: 10.1155/2021/3569632. eCollection 2021.
Feature selection is the key step in the analysis of high-dimensional small sample data. The core of feature selection is to analyse and quantify the correlation between features and class labels and the redundancy between features. However, most of the existing feature selection algorithms only consider the classification contribution of individual features and ignore the influence of interfeature redundancy and correlation. Therefore, this paper proposes a feature selection algorithm for nonlinear dynamic conditional relevance (NDCRFS) through the study and analysis of the existing feature selection algorithm ideas and method. Firstly, redundancy and relevance between features and between features and class labels are discriminated by mutual information, conditional mutual information, and interactive mutual information. Secondly, the selected features and candidate features are dynamically weighted utilizing information gain factors. Finally, to evaluate the performance of this feature selection algorithm, NDCRFS was validated against 6 other feature selection algorithms on three classifiers, using 12 different data sets, for variability and classification metrics between the different algorithms. The experimental results show that the NDCRFS method can improve the quality of the feature subsets and obtain better classification results.
特征选择是高维小样本数据分析的关键步骤。特征选择的核心是分析和量化特征与类别标签之间的相关性以及特征之间的冗余性。然而,现有的大多数特征选择算法仅考虑单个特征的分类贡献,而忽略了特征间冗余和相关性的影响。因此,本文通过对现有特征选择算法思想和方法的研究和分析,提出了一种非线性动态条件相关特征选择算法(NDCRFS)。首先,通过互信息、条件互信息和交互互信息来区分特征之间以及特征与类别标签之间的冗余性和相关性。其次,利用信息增益因子对选定的特征和候选特征进行动态加权。最后,为了评估该特征选择算法的性能,在三个分类器上使用 12 个不同的数据集,针对不同算法之间的变异性和分类度量标准,将 NDCRFS 与其他 6 种特征选择算法进行了验证。实验结果表明,NDCRFS 方法可以提高特征子集的质量,并获得更好的分类结果。