Liu Jinghua, Yang Songwei, Zhang Hongbo, Sun Zhenzhen, Du Jixiang
Department of Computer Science and Technology, Huaqiao University, Xiamen 361021, China.
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen 361021, China.
Entropy (Basel). 2023 Jul 17;25(7):1071. doi: 10.3390/e25071071.
Multi-label streaming feature selection has received widespread attention in recent years because the dynamic acquisition of features is more in line with the needs of practical application scenarios. Most previous methods either assume that the labels are independent of each other, or, although label correlation is explored, the relationship between related labels and features is difficult to understand or specify. In real applications, both situations may occur where the labels are correlated and the features may belong specifically to some labels. Moreover, these methods treat features individually without considering the interaction between features. Based on this, we present a novel online streaming feature selection method based on label group correlation and feature interaction (OSLGC). In our design, we first divide labels into multiple groups with the help of graph theory. Then, we integrate label weight and mutual information to accurately quantify the relationships between features under different label groups. Subsequently, a novel feature selection framework using sliding windows is designed, including online feature relevance analysis and online feature interaction analysis. Experiments on ten datasets show that the proposed method outperforms some mature MFS algorithms in terms of predictive performance, statistical analysis, stability analysis, and ablation experiments.
多标签流特征选择近年来受到广泛关注,因为特征的动态获取更符合实际应用场景的需求。以前的大多数方法要么假设标签相互独立,要么尽管探索了标签相关性,但相关标签与特征之间的关系难以理解或明确。在实际应用中,可能会出现标签相关且特征可能特定于某些标签的情况。此外,这些方法单独处理特征,而不考虑特征之间的相互作用。基于此,我们提出了一种基于标签组相关性和特征交互的新型在线流特征选择方法(OSLGC)。在我们的设计中,我们首先借助图论将标签划分为多个组。然后,我们整合标签权重和互信息,以准确量化不同标签组下特征之间的关系。随后,设计了一种使用滑动窗口的新型特征选择框架,包括在线特征相关性分析和在线特征交互分析。在十个数据集上的实验表明,该方法在预测性能、统计分析、稳定性分析和消融实验方面优于一些成熟的多标签流特征选择算法。