Lyu Gengyu, Yang Zhen, Deng Xiang, Feng Songhe
IEEE Trans Neural Netw Learn Syst. 2025 Apr;36(4):6569-6583. doi: 10.1109/TNNLS.2024.3390776. Epub 2025 Apr 4.
In the task of multiview multilabel (MVML) classification, each instance is represented by several heterogeneous features and associated with multiple semantic labels. Existing MVML methods mainly focus on leveraging the shared subspace to comprehensively explore multiview consensus information across different views, while it is still an open problem whether such shared subspace representation is effective to characterize all relevant labels when formulating a desired MVML model. In this article, we propose a novel label-driven view-specific fusion MVML method named L-VSM, which bypasses seeking for a shared subspace representation and instead directly encodes the feature representation of each individual view to contribute to the final multilabel classifier induction. Specifically, we first design a label-driven feature graph construction strategy and construct all instances under various feature representations into the corresponding feature graphs. Then, these view-specific feature graphs are integrated into a unified graph by linking the different feature representations within each instance. Afterward, we adopt a graph attention mechanism to aggregate and update all feature nodes on the unified graph to generate structural representations for each instance, where both intraview correlations and interview alignments are jointly encoded to discover the underlying consensuses and complementarities across different views. Moreover, to explore the widespread label correlations in multilabel learning (MLL), the transformer architecture is introduced to construct a dynamic semantic-aware label graph and accordingly generate structural semantic representations for each specific class. Finally, we derive an instance-label affinity score for each instance by averaging the affinity scores of its different feature representations with the multilabel soft margin loss. Extensive experiments on various MVML applications have verified that our proposed L-VSM has achieved superior performance against state-of-the-art methods. The codes are available at https://gengyulyu.github.io/homepage/assets/codes/LVSM.zip.
在多视图多标签(MVML)分类任务中,每个实例由多个异构特征表示,并与多个语义标签相关联。现有的MVML方法主要侧重于利用共享子空间来全面探索不同视图之间的多视图共识信息,而在构建所需的MVML模型时,这种共享子空间表示是否能有效地表征所有相关标签仍是一个悬而未决的问题。在本文中,我们提出了一种名为L-VSM的新颖的标签驱动视图特定融合MVML方法,该方法绕过了寻找共享子空间表示的过程,而是直接对每个单独视图的特征表示进行编码,以促进最终多标签分类器的归纳。具体而言,我们首先设计一种标签驱动的特征图构建策略,并将各种特征表示下的所有实例构建到相应的特征图中。然后,通过链接每个实例内的不同特征表示,将这些视图特定的特征图集成到一个统一的图中。之后,我们采用图注意力机制来聚合和更新统一图上的所有特征节点,以生成每个实例的结构表示,其中视图内相关性和视图间对齐都被联合编码,以发现不同视图之间的潜在共识和互补性。此外,为了探索多标签学习(MLL)中广泛存在的标签相关性,引入了Transformer架构来构建动态语义感知标签图,并相应地为每个特定类别生成结构语义表示。最后,我们通过将每个实例的不同特征表示与多标签软间隔损失的亲和度得分进行平均,得出每个实例的实例-标签亲和度得分。在各种MVML应用上进行的大量实验验证了我们提出的L-VSM相对于现有方法取得了卓越的性能。代码可在https://gengyulyu.github.io/homepage/assets/codes/LVSM.zip获取。