Wang Zhenwu, Wang Tielin, Wan Benting, Han Mengjie
Department of Computer Science and Technology, China University of Mining and Technology, Beijing 100083, China.
School of Software and IoT Engineering, Jiangxi University of Finance & Economics, Nanchang 330013, China.
Entropy (Basel). 2020 Oct 10;22(10):1143. doi: 10.3390/e22101143.
Multi-label classification (MLC) is a supervised learning problem where an object is naturally associated with multiple concepts because it can be described from various dimensions. How to exploit the resulting label correlations is the key issue in MLC problems. The classifier chain (CC) is a well-known MLC approach that can learn complex coupling relationships between labels. CC suffers from two obvious drawbacks: (1) label ordering is decided at random although it usually has a strong effect on predictive performance; (2) all the labels are inserted into the chain, although some of them may carry irrelevant information that discriminates against the others. In this work, we propose a partial classifier chain method with feature selection (PCC-FS) that exploits the label correlation between label and feature spaces and thus solves the two previously mentioned problems simultaneously. In the PCC-FS algorithm, feature selection is performed by learning the covariance between feature set and label set, thus eliminating the irrelevant features that can diminish classification performance. Couplings in the label set are extracted, and the coupled labels of each label are inserted simultaneously into the chain structure to execute the training and prediction activities. The experimental results from five metrics demonstrate that, in comparison to eight state-of-the-art MLC algorithms, the proposed method is a significant improvement on existing multi-label classification.
多标签分类(MLC)是一个监督学习问题,其中一个对象自然地与多个概念相关联,因为它可以从多个维度进行描述。如何利用由此产生的标签相关性是MLC问题中的关键问题。分类器链(CC)是一种著名的MLC方法,它可以学习标签之间复杂的耦合关系。CC存在两个明显的缺点:(1)标签排序是随机决定的,尽管它通常对预测性能有很大影响;(2)所有标签都被插入到链中,尽管其中一些标签可能携带与其他标签有区别的无关信息。在这项工作中,我们提出了一种带有特征选择的部分分类器链方法(PCC-FS),该方法利用标签与特征空间之间的标签相关性,从而同时解决上述两个问题。在PCC-FS算法中,通过学习特征集与标签集之间的协方差来执行特征选择,从而消除可能降低分类性能的无关特征。提取标签集中的耦合关系,并将每个标签的耦合标签同时插入到链结构中,以执行训练和预测活动。来自五个指标的实验结果表明,与八种最新的MLC算法相比,所提出的方法对现有的多标签分类有显著改进。