一种用于细粒度零样本学习的多组多流属性注意网络。
A Multi-Group Multi-Stream attribute Attention network for fine-grained zero-shot learning.
机构信息
School of Computer Science, Northwestern Polytechnical University, Xi'an, 710129, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710129, China.
SPKLSTN Lab, Department of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.
出版信息
Neural Netw. 2024 Nov;179:106558. doi: 10.1016/j.neunet.2024.106558. Epub 2024 Jul 20.
Fine-grained visual categorization in zero-shot setting is a challenging problem in the computer vision community. It requires algorithms to accurately identify fine-grained categories that do not appear during the training phase and have high visual similarity to each other. Existing methods usually address this problem by using attribute information as intermediate knowledge, which provides sufficient fine-grained characteristics of categories and can be transferred from seen categories to unseen categories. However, the learning of attribute visual features is not trivial due to the following two reasons: (i) The visual information about attributes of different types may interfere with the visual feature learning of each other. (ii) The visual characteristics of the same attribute may vary in different categories. To solve these issues, we propose a Multi-Group Multi-Stream attribute Attention network (MGMSA), which not only separates the feature learning of attributes of different types, but also isolates the learning of attribute visual features for categories with big differences in attribute appearance. This avoids the interference between uncorrelated attributes and helps to learn category-specific attribute-related visual features. This is beneficial for distinguishing fine-grained categories with subtle visual differences. Extensive experiments on benchmark datasets show that MGMSA achieves state-of-the-art performance on attribute prediction and fine-grained zero-shot learning.
在零样本设置下进行细粒度视觉分类是计算机视觉领域的一个具有挑战性的问题。它要求算法能够准确识别在训练阶段未出现且彼此之间具有高度视觉相似性的细粒度类别。现有的方法通常通过使用属性信息作为中间知识来解决这个问题,该方法提供了类别充分的细粒度特征,并可以从可见类别转移到不可见类别。然而,由于以下两个原因,属性视觉特征的学习并不简单:(i)不同类型的属性的视觉信息可能相互干扰。(ii)同一属性的视觉特征在不同类别中可能有所不同。为了解决这些问题,我们提出了一种多组多流属性注意网络(MGMSA),它不仅分离了不同类型属性的特征学习,而且隔离了属性外观差异较大的类别的属性视觉特征学习。这避免了不相关属性之间的干扰,并有助于学习类别特定的与属性相关的视觉特征。这有助于区分具有细微视觉差异的细粒度类别。在基准数据集上的广泛实验表明,MGMSA 在属性预测和细粒度零样本学习方面取得了最先进的性能。