IEEE Trans Pattern Anal Mach Intell. 2022 Mar;44(3):1371-1384. doi: 10.1109/TPAMI.2020.3025814. Epub 2022 Feb 3.
Recognizing multiple labels of an image is a practical yet challenging task, and remarkable progress has been achieved by searching for semantic regions and exploiting label dependencies. However, current works utilize RNN/LSTM to implicitly capture sequential region/label dependencies, which cannot fully explore mutual interactions among the semantic regions/labels and do not explicitly integrate label co-occurrences. In addition, these works require large amounts of training samples for each category, and they are unable to generalize to novel categories with limited samples. To address these issues, we propose a knowledge-guided graph routing (KGGR) framework, which unifies prior knowledge of statistical label correlations with deep neural networks. The framework exploits prior knowledge to guide adaptive information propagation among different categories to facilitate multi-label analysis and reduce the dependency of training samples. Specifically, it first builds a structured knowledge graph to correlate different labels based on statistical label co-occurrence. Then, it introduces the label semantics to guide learning semantic-specific features to initialize the graph, and it exploits a graph propagation network to explore graph node interactions, enabling learning contextualized image feature representations. Moreover, we initialize each graph node with the classifier weights for the corresponding label and apply another propagation network to transfer node messages through the graph. In this way, it can facilitate exploiting the information of correlated labels to help train better classifiers, especially for labels with limited training samples. We conduct extensive experiments on the traditional multi-label image recognition (MLR) and multi-label few-shot learning (ML-FSL) tasks and show that our KGGR framework outperforms the current state-of-the-art methods by sizable margins on the public benchmarks.
识别图像的多个标签是一项实际而具有挑战性的任务,通过搜索语义区域和利用标签依赖性已经取得了显著的进展。然而,目前的工作利用 RNN/LSTM 来隐式地捕获序列区域/标签依赖性,这不能充分探索语义区域/标签之间的相互作用,也不能显式地整合标签共现。此外,这些工作需要为每个类别提供大量的训练样本,并且无法将其推广到具有有限样本的新类别。为了解决这些问题,我们提出了一种知识引导的图路由(KGGR)框架,该框架将统计标签相关性的先验知识与深度神经网络相结合。该框架利用先验知识来指导不同类别之间的自适应信息传播,以促进多标签分析并减少对训练样本的依赖。具体来说,它首先构建一个结构化的知识图,根据统计标签共现来关联不同的标签。然后,它引入标签语义来指导学习语义特定的特征,以初始化图,并利用图传播网络来探索图节点之间的交互,从而学习上下文相关的图像特征表示。此外,我们为每个图节点初始化与相应标签对应的分类器权重,并应用另一个传播网络通过图传递节点消息。通过这种方式,可以利用相关标签的信息来帮助训练更好的分类器,特别是对于具有有限训练样本的标签。我们在传统的多标签图像识别(MLR)和多标签少样本学习(ML-FSL)任务上进行了广泛的实验,并在公共基准上展示了我们的 KGGR 框架在当前最先进的方法上有很大的优势。