Zhang Ping, Gao Wanfu, Hu Juncheng, Li Yonghao
College of Computer Science and Technology, Jilin University, Changchun 130012, China.
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.
Entropy (Basel). 2020 Jul 21;22(7):797. doi: 10.3390/e22070797.
Multi-label data often involve features with high dimensionality and complicated label correlations, resulting in a great challenge for multi-label learning. Feature selection plays an important role in multi-label learning to address multi-label data. Exploring label correlations is crucial for multi-label feature selection. Previous information-theoretical-based methods employ the strategy of cumulative summation approximation to evaluate candidate features, which merely considers low-order label correlations. In fact, there exist high-order label correlations in label set, labels naturally cluster into several groups, similar labels intend to cluster into the same group, different labels belong to different groups. However, the strategy of cumulative summation approximation tends to select the features related to the groups containing more labels while ignoring the classification information of groups containing less labels. Therefore, many features related to similar labels are selected, which leads to poor classification performance. To this end, Max-Correlation term considering high-order label correlations is proposed. Additionally, we combine the Max-Correlation term with feature redundancy term to ensure that selected features are relevant to different label groups. Finally, a new method named Multi-label Feature Selection considering Max-Correlation (MCMFS) is proposed. Experimental results demonstrate the classification superiority of MCMFS in comparison to eight state-of-the-art multi-label feature selection methods.
多标签数据通常涉及高维特征和复杂的标签相关性,这给多标签学习带来了巨大挑战。特征选择在处理多标签数据的多标签学习中起着重要作用。探索标签相关性对于多标签特征选择至关重要。先前基于信息论的方法采用累积求和近似策略来评估候选特征,该策略仅考虑低阶标签相关性。实际上,标签集中存在高阶标签相关性,标签自然地聚类为几个组,相似的标签倾向于聚类到同一组,不同的标签属于不同的组。然而,累积求和近似策略倾向于选择与包含更多标签的组相关的特征,而忽略了包含较少标签的组的分类信息。因此,许多与相似标签相关的特征被选中,这导致分类性能较差。为此,提出了考虑高阶标签相关性的最大相关项。此外,我们将最大相关项与特征冗余项相结合,以确保所选特征与不同的标签组相关。最后,提出了一种名为考虑最大相关的多标签特征选择(MCMFS)的新方法。实验结果表明,与八种最先进的多标签特征选择方法相比,MCMFS具有分类优势。