Zha Zheng-Jun, Wang Chong, Liu Dong, Xie Hongtao, Zhang Yongdong
IEEE Trans Neural Netw Learn Syst. 2020 Jul;31(7):2398-2408. doi: 10.1109/TNNLS.2020.2967471. Epub 2020 Feb 13.
High-level semantic knowledge in addition to low-level visual cues is essentially crucial for co-saliency detection. This article proposes a novel end-to-end deep learning approach for robust co-saliency detection by simultaneously learning high-level groupwise semantic representation as well as deep visual features of a given image group. The interimage interaction at the semantic level and the complementarity between the group semantics and visual features are exploited to boost the inferring capability of co-salient regions. Specifically, the proposed approach consists of a co-category learning branch and a co-saliency detection branch. While the former is proposed to learn a groupwise semantic vector using co-category association of an image group as supervision, the latter is to infer precise co-salient maps based on the ensemble of group-semantic knowledge and deep visual cues. The group-semantic vector is used to augment visual features at multiple scales and acts as a top-down semantic guidance for boosting the bottom-up inference of co-saliency. Moreover, we develop a pyramidal attention (PA) module that endows the network with the capability of concentrating on important image patches and suppressing distractions. The co-category learning and co-saliency detection branches are jointly optimized in a multitask learning manner, further improving the robustness of the approach. We construct a new large-scale co-saliency data set COCO-SEG to facilitate research of the co-saliency detection. Extensive experimental results on COCO-SEG and a widely used benchmark Cosal2015 have demonstrated the superiority of the proposed approach compared with state-of-the-art methods.
除了低级视觉线索外,高级语义知识对于协同显著性检测至关重要。本文提出了一种新颖的端到端深度学习方法,通过同时学习给定图像组的高级分组语义表示以及深度视觉特征,来进行鲁棒的协同显著性检测。利用语义层面的图像间交互以及组语义与视觉特征之间的互补性,来提升协同显著区域的推断能力。具体而言,所提出的方法由一个协同类别学习分支和一个协同显著性检测分支组成。前者旨在以图像组的协同类别关联为监督来学习一个分组语义向量,而后者则基于组语义知识和深度视觉线索的集合来推断精确的协同显著图。分组语义向量用于在多个尺度上增强视觉特征,并作为自上而下的语义指导,以促进协同显著性的自下而上推断。此外,我们开发了一个金字塔注意力(PA)模块,赋予网络专注于重要图像块并抑制干扰的能力。协同类别学习和协同显著性检测分支以多任务学习的方式进行联合优化,进一步提高了该方法的鲁棒性。我们构建了一个新的大规模协同显著性数据集COCO-SEG,以促进协同显著性检测的研究。在COCO-SEG和广泛使用的基准数据集Cosal2015上的大量实验结果表明,与现有方法相比,所提出的方法具有优越性。