Zhang Xiang, Zhao Wanqing, Zhang Wei, Peng Jinye, Fan Jianping
IEEE Trans Image Process. 2022;31:2695-2709. doi: 10.1109/TIP.2022.3160399. Epub 2022 Mar 29.
The existing publicly available datasets with pixel-level labels contain limited categories, and it is difficult to generalize to the real world containing thousands of categories. In this paper, we propose an approach to generate object masks with detailed pixel-level structures/boundaries automatically to enable semantic image segmentation of thousands of targets in the real world without manually labelling. A Guided Filter Network (GFN) is first developed to learn the segmentation knowledge from an existed dataset, and such GFN then transfers the learned segmentation knowledge to generate initial coarse object masks for the target images. These coarse object masks are treated as pseudo labels to self-optimize the GFN iteratively in the target images. Our experiments on six image sets have demonstrated that our proposed approach can generate object masks with detailed pixel-level structures/boundaries, whose quality is comparable to the manually-labelled ones. Our proposed approach also achieves better performance on semantic image segmentation than most existing weakly-supervised, semi-supervised, and domain adaptation approaches under the same experimental conditions.
现有的带有像素级标签的公开可用数据集包含的类别有限,并且难以推广到包含数千个类别的现实世界。在本文中,我们提出了一种方法,可自动生成具有详细像素级结构/边界的对象掩码,从而在无需手动标注的情况下实现对现实世界中数千个目标的语义图像分割。首先开发了一个引导滤波器网络(GFN),以从现有数据集中学习分割知识,然后该GFN将学习到的分割知识进行转移,为目标图像生成初始的粗糙对象掩码。这些粗糙对象掩码被视为伪标签,以在目标图像中对GFN进行迭代自优化。我们在六个图像集上的实验表明,我们提出的方法可以生成具有详细像素级结构/边界的对象掩码,其质量与手动标注的掩码相当。在相同实验条件下,我们提出的方法在语义图像分割方面也比大多数现有的弱监督、半监督和域适应方法具有更好的性能。