IEEE Trans Cybern. 2019 Dec;49(12):4243-4252. doi: 10.1109/TCYB.2018.2861419. Epub 2018 Oct 5.
This paper focuses on weakly supervised image understanding, in which the semantic labels are available only at image-level, without the specific object or scene location in an image. Existing algorithms implicitly assume that image-level labels are error-free, which might be too restrictive. In practice, image labels obtained from the pretrained predictors are easily contaminated. To solve this problem, we propose a novel algorithm for weakly supervised segmentation when only noisy image labels are available during training. More specifically, a semantic space is constructed first by encoding image labels through a graphlet (i.e., superpixel cluster) embedding process. Then, we observe that in the semantic space, the distribution of graphlets from images with a same label remains stable, regardless of the noises in image labels. Therefore, we propose a generative model, called latent stability analysis, to discover the stable patterns from images with noisy labels. Inferring graphlet semantics by making use of these mid-level stable patterns is much more secure and accurate than directly transferring noisy image-level labels into different regions. Finally, we calculate the semantics of each superpixel using maximum majority voting of its correlated graphlets. Comprehensive experimental results show that our algorithm performs impressively when the image labels are predicted by either the hand-crafted or deeply learned image descriptors.
本文主要关注弱监督图像理解,在这种情况下,语义标签仅在图像级别提供,而没有图像中具体对象或场景的位置信息。现有的算法隐含地假设图像级标签是无错误的,但这可能过于严格。在实践中,从预训练预测器获得的图像标签很容易受到污染。为了解决这个问题,我们提出了一种新的算法,用于在训练期间仅使用有噪声的图像标签进行弱监督分割。更具体地说,首先通过图元(即超像素聚类)嵌入过程对图像标签进行编码,从而构建语义空间。然后,我们观察到在语义空间中,来自具有相同标签的图像的图元分布保持稳定,而不管图像标签中的噪声如何。因此,我们提出了一种称为潜在稳定性分析的生成模型,以从有噪声标签的图像中发现稳定的模式。利用这些中级稳定模式推断图元语义比直接将有噪声的图像级标签转换为不同区域更安全、更准确。最后,我们使用相关图元的最大多数投票来计算每个超像素的语义。综合实验结果表明,当图像标签由手工制作或深度学习的图像描述符预测时,我们的算法表现出色。