Gupta Akshita, Narayan Sanath, Khan Salman, Khan Fahad Shahbaz, Shao Ling, van de Weijer Joost
IEEE Trans Pattern Anal Mach Intell. 2023 Dec;45(12):14611-14624. doi: 10.1109/TPAMI.2023.3295772. Epub 2023 Nov 3.
Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of zero-shot setting. When multiple objects occur jointly in a single image, a critical question is how to effectively fuse multi-class information. In this work, we introduce different fusion approaches at the attribute-level, feature-level and cross-level (across attribute and feature-levels) for synthesizing multi-label features from their corresponding multi-label class embeddings. To the best of our knowledge, our work is the first to tackle the problem of multi-label feature synthesis in the (generalized) zero-shot setting. Our cross-level fusion-based generative approach outperforms the state-of-the-art on three zero-shot benchmarks: NUS-WIDE, Open Images and MS COCO. Furthermore, we show the generalization capabilities of our fusion approach in the zero-shot detection task on MS COCO, achieving favorable performance against existing methods.
多标签零样本学习致力于将图像分类到多个在训练期间没有可用数据的未见类别中。在广义变体中,测试样本还可以包含已见类别。现有方法依赖于从已见类别中学习共享注意力或特定于标签的注意力。然而,在多标签设置的推理过程中为未见类别计算可靠的注意力图仍然是一个挑战。相比之下,基于生成对抗网络(GAN)的最先进的单标签方法学习从相应的类别属性嵌入中直接合成特定于类别的视觉特征。然而,在零样本设置的背景下,从GAN中合成多标签特征仍未得到探索。当多个物体在单个图像中共同出现时,一个关键问题是如何有效地融合多类信息。在这项工作中,我们在属性级别、特征级别和跨级别(跨属性和特征级别)引入了不同的融合方法,用于从相应的多标签类别嵌入中合成多标签特征。据我们所知,我们的工作是首次解决(广义)零样本设置中的多标签特征合成问题。我们基于跨级别融合的生成方法在三个零样本基准测试(NUS-WIDE、开放图像和MS COCO)上优于现有技术。此外,我们展示了我们的融合方法在MS COCO的零样本检测任务中的泛化能力,相对于现有方法取得了良好的性能。