Gu Zhangxuan, Zhou Siyuan, Niu Li, Zhao Zihan, Zhang Liqing
IEEE Trans Neural Netw Learn Syst. 2023 Oct;34(10):7689-7703. doi: 10.1109/TNNLS.2022.3145962. Epub 2023 Oct 5.
Zero-shot learning (ZSL) has been actively studied for image classification tasks to relieve the burden of annotating image labels. Interestingly, the semantic segmentation task requires more labor-intensive pixel-wise annotation, but zero-shot semantic segmentation has not attracted extensive research interest. Thus, we focus on zero-shot semantic segmentation that aims to segment unseen objects with only category-level semantic representations provided for unseen categories. In this article, we propose a novel context-aware feature generation network (CaGNet) that can synthesize context-aware pixel-wise visual features for unseen categories based on category-level semantic representations and pixel-wise contextual information. The synthesized features are used to fine-tune the classifier to enable segmenting of unseen objects. Furthermore, we extend pixel-wise feature generation and fine-tuning to patch-wise feature generation and fine-tuning, which additionally considers the interpixel relationship. Experimental results on Pascal-VOC, Pascal-context, and COCO-stuff show that our method significantly outperforms the existing zero-shot semantic segmentation methods.
零样本学习(ZSL)已被积极研究用于图像分类任务,以减轻标注图像标签的负担。有趣的是,语义分割任务需要更多劳动密集型的逐像素标注,但零样本语义分割尚未引起广泛的研究兴趣。因此,我们专注于零样本语义分割,其旨在仅使用为未见类别提供的类别级语义表示来分割未见对象。在本文中,我们提出了一种新颖的上下文感知特征生成网络(CaGNet),它可以基于类别级语义表示和逐像素上下文信息为未见类别合成上下文感知的逐像素视觉特征。合成的特征用于微调分类器,以实现对未见对象的分割。此外,我们将逐像素特征生成和微调扩展到逐补丁特征生成和微调,这还考虑了像素间关系。在Pascal-VOC、Pascal-context和COCO-stuff上的实验结果表明,我们的方法显著优于现有的零样本语义分割方法。