Suppr超能文献

基于高阶标签相关性假设的多标签特征选择

Multi-Label Feature Selection Based on High-Order Label Correlation Assumption.

作者信息

Zhang Ping, Gao Wanfu, Hu Juncheng, Li Yonghao

机构信息

College of Computer Science and Technology, Jilin University, Changchun 130012, China.

Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China.

出版信息

Entropy (Basel). 2020 Jul 21;22(7):797. doi: 10.3390/e22070797.

Abstract

Multi-label data often involve features with high dimensionality and complicated label correlations, resulting in a great challenge for multi-label learning. Feature selection plays an important role in multi-label learning to address multi-label data. Exploring label correlations is crucial for multi-label feature selection. Previous information-theoretical-based methods employ the strategy of cumulative summation approximation to evaluate candidate features, which merely considers low-order label correlations. In fact, there exist high-order label correlations in label set, labels naturally cluster into several groups, similar labels intend to cluster into the same group, different labels belong to different groups. However, the strategy of cumulative summation approximation tends to select the features related to the groups containing more labels while ignoring the classification information of groups containing less labels. Therefore, many features related to similar labels are selected, which leads to poor classification performance. To this end, Max-Correlation term considering high-order label correlations is proposed. Additionally, we combine the Max-Correlation term with feature redundancy term to ensure that selected features are relevant to different label groups. Finally, a new method named Multi-label Feature Selection considering Max-Correlation (MCMFS) is proposed. Experimental results demonstrate the classification superiority of MCMFS in comparison to eight state-of-the-art multi-label feature selection methods.

摘要

多标签数据通常涉及高维特征和复杂的标签相关性,这给多标签学习带来了巨大挑战。特征选择在处理多标签数据的多标签学习中起着重要作用。探索标签相关性对于多标签特征选择至关重要。先前基于信息论的方法采用累积求和近似策略来评估候选特征,该策略仅考虑低阶标签相关性。实际上,标签集中存在高阶标签相关性,标签自然地聚类为几个组,相似的标签倾向于聚类到同一组,不同的标签属于不同的组。然而,累积求和近似策略倾向于选择与包含更多标签的组相关的特征,而忽略了包含较少标签的组的分类信息。因此,许多与相似标签相关的特征被选中,这导致分类性能较差。为此,提出了考虑高阶标签相关性的最大相关项。此外,我们将最大相关项与特征冗余项相结合,以确保所选特征与不同的标签组相关。最后,提出了一种名为考虑最大相关的多标签特征选择(MCMFS)的新方法。实验结果表明,与八种最先进的多标签特征选择方法相比,MCMFS具有分类优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1802/7517369/49ef0729df77/entropy-22-00797-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验