Faculty of Information and Technology, Beijing University of Technology, Beijing 100020, China.
Beijing-Dublin International College, Beijing University of Technology, Beijing 100020, China.
Comput Intell Neurosci. 2022 Oct 8;2022:9243893. doi: 10.1155/2022/9243893. eCollection 2022.
Feature selection is an important way to optimize the efficiency and accuracy of classifiers. However, traditional feature selection methods cannot work with many kinds of data in the real world, such as multi-label data. To overcome this challenge, multi-label feature selection is developed. Multi-label feature selection plays an irreplaceable role in pattern recognition and data mining. This process can improve the efficiency and accuracy of multi-label classification. However, traditional multi-label feature selection based on mutual information does not fully consider the effect of redundancy among labels. The deficiency may lead to repeated computing of mutual information and leave room to enhance the accuracy of multi-label feature selection. To deal with this challenge, this paper proposed a multi-label feature selection based on conditional mutual information among labels (CRMIL). Firstly, we analyze how to reduce the redundancy among features based on existing papers. Secondly, we propose a new approach to diminish the redundancy among labels. This method takes label sets as conditions to calculate the relevance between features and labels. This approach can weaken the impact of the redundancy among labels on feature selection results. Finally, we analyze this algorithm and balance the effects of relevance and redundancy on the evaluation function. For testing CRMIL, we compare it with the other eight multi-label feature selection algorithms on ten datasets and use four evaluation criteria to examine the results. Experimental results illustrate that CRMIL performs better than other existing algorithms.
特征选择是优化分类器效率和准确性的重要方法。然而,传统的特征选择方法无法处理现实世界中的许多数据,例如多标签数据。为了克服这一挑战,开发了多标签特征选择。多标签特征选择在模式识别和数据挖掘中起着不可替代的作用。这个过程可以提高多标签分类的效率和准确性。然而,传统的基于互信息的多标签特征选择并没有充分考虑标签之间冗余的影响。这种不足可能导致互信息的重复计算,并为提高多标签特征选择的准确性留出空间。为了解决这个挑战,本文提出了一种基于标签间条件互信息的多标签特征选择方法(CRMIL)。首先,我们分析了如何基于现有文献减少特征之间的冗余。其次,我们提出了一种减少标签之间冗余的新方法。该方法将标签集作为条件来计算特征与标签之间的相关性。这种方法可以减弱标签之间冗余对特征选择结果的影响。最后,我们分析了该算法,并平衡了相关性和冗余性对评价函数的影响。为了测试 CRMIL,我们在十个数据集上与其他八个多标签特征选择算法进行了比较,并使用了四个评价标准来检验结果。实验结果表明,CRMIL 比其他现有的算法表现更好。