IEEE Trans Image Process. 2020;29:225-236. doi: 10.1109/TIP.2019.2926748. Epub 2019 Jul 12.
Deep neural network-based semantic segmentation generally requires large-scale cost extensive annotations for training to obtain better performance. To avoid pixel-wise segmentation annotations that are needed for most methods, recently some researchers attempted to use object-level labels (e.g., bounding boxes) or image-level labels (e.g., image categories). In this paper, we propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. For each image, an initial coarse mask is first generated by a convolutional neural network-based unsupervised foreground segmentation model and then is enhanced by a graph model. The enhanced coarse mask is fed to a fully convolutional neural network to be recursively refined. Unlike the existing image-level label-based semantic segmentation methods, which require labeling of all categories for images that contain multiple types of objects, our framework only needs one label for each image and can handle images that contain multi-category objects. Only trained on ImageNet, our framework achieves comparable performance on the PASCAL VOC dataset with other image-level label-based state-of-the-art methods of semantic segmentation. Furthermore, our framework can be easily extended to foreground object segmentation task and achieves comparable performance with the state-of-the-art supervised methods on the Internet object dataset.
基于深度神经网络的语义分割通常需要大规模的成本广泛注释进行训练,以获得更好的性能。为了避免大多数方法所需的像素级分割注释,最近一些研究人员试图使用对象级标签(例如,边界框)或图像级标签(例如,图像类别)。在本文中,我们提出了一种新颖的基于仅图像级类别标签的递归粗到细语义分割框架。对于每张图像,首先通过基于卷积神经网络的无监督前景分割模型生成初始粗掩码,然后通过图模型进行增强。增强后的粗掩码被馈送到全卷积神经网络中进行递归细化。与现有的基于图像级标签的语义分割方法不同,这些方法需要对包含多种类型对象的图像中的所有类别进行标记,我们的框架仅需要为每张图像标记一个标签,并且可以处理包含多类别对象的图像。我们的框架仅在 ImageNet 上进行训练,就可以在 PASCAL VOC 数据集上与其他基于图像级标签的语义分割最新方法相媲美。此外,我们的框架可以很容易地扩展到前景对象分割任务,并在互联网对象数据集上与监督方法的最新方法相媲美。