Suppr超能文献

学习图卷积网络进行多标签识别及应用。

Learning Graph Convolutional Networks for Multi-Label Recognition and Applications.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6969-6983. doi: 10.1109/TPAMI.2021.3063496. Epub 2023 May 5.

Abstract

The task of multi-label image recognition is to predict a set of object labels that present in an image. As objects normally co-occur in an image, it is desirable to model the label dependencies to improve the recognition performance. To capture and explore such important information, we propose graph convolutional networks (GCNs) based models for multi-label image recognition, where directed graphs are constructed over classes and information is propagated between classes to learn inter-dependent class-level representations. Following this idea, we design two particular models that approach multi-label classification from different views. In our first model, the prior knowledge about the class dependencies is integrated into classifier learning. Specifically, we propose Classifier Learning GCN (C-GCN) to map class-level semantic representations (e.g., word embeddings) into classifiers that maintain the inter-class topology. In our second model, we decompose the visual representation of an image into a set of label-aware features and propose prediction learning GCN (P-GCN) to encode such features into inter-dependent image-level prediction scores. Furthermore, we also present an effective correlation matrix construction approach to capture inter-class relationships and consequently guide information propagation among classes. Empirical results on generic multi-label image recognition demonstrate that both of the proposed models can obviously outperform other existing state-of-the-arts. Moreover, the proposed methods also show advantages in some other multi-label classification related applications.

摘要

多标签图像识别的任务是预测图像中存在的一组对象标签。由于对象通常在图像中共同出现,因此希望建模标签依赖性以提高识别性能。为了捕获和探索这种重要信息,我们提出了基于图卷积网络(GCN)的模型用于多标签图像识别,其中在类别上构建有向图,并在类别之间传播信息以学习相互依赖的类别级表示。基于此思想,我们从不同角度设计了两个特殊的模型来进行多标签分类。在我们的第一个模型中,将类别的先验知识集成到分类器学习中。具体来说,我们提出了分类器学习 GCN(C-GCN),将类别级语义表示(例如,词嵌入)映射到保持类间拓扑的分类器中。在我们的第二个模型中,我们将图像的视觉表示分解为一组标签感知特征,并提出预测学习 GCN(P-GCN)将这些特征编码为相互依赖的图像级预测分数。此外,我们还提出了一种有效的相关矩阵构建方法来捕获类间关系,并因此指导类之间的信息传播。在通用多标签图像识别上的实验结果表明,所提出的两种模型都可以明显优于其他现有的最先进的方法。此外,所提出的方法在其他一些多标签分类相关应用中也显示出优势。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验