Liu Nian, Han Junwei, Yang Ming-Hsuan
IEEE Trans Image Process. 2020 Apr 23. doi: 10.1109/TIP.2020.2988568.
Existing saliency models typically incorporate contexts holistically. However, for each pixel, usually only part of its context region contributes to saliency prediction, while other parts are likely either noise or distractions. In this paper, we propose a novel pixel-wise contextual attention network (PiCANet) to selectively attend to informative context locations at each pixel. The proposed PiCANet generates an attention map over the contextual region of each pixel and construct attentive contextual features via selectively incorporating the features of useful context locations. We present three formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts. All the three models are fully differentiable and can be integrated with convolutional neural networks with joint training. In this work, we introduce the proposed PiCANets into a U-Net model for salient object detection. The generated global and local attention maps can learn to incorporate global contrast and regional smoothness, which help localize and highlight salient objects more accurately and uniformly. Experimental results show that the proposed PiCANets perform effectively for saliency detection against the state-of-the-art methods. Furthermore, we demonstrate the effectiveness and generalization ability of the PiCANets on semantic segmentation and object detection with improved performance.
现有的显著性模型通常整体地纳入上下文信息。然而,对于每个像素而言,通常只有其上下文区域的一部分有助于显著性预测,而其他部分可能要么是噪声,要么是干扰因素。在本文中,我们提出了一种新颖的逐像素上下文注意力网络(PiCANet),以在每个像素处选择性地关注信息丰富的上下文位置。所提出的PiCANet在每个像素的上下文区域上生成一个注意力图,并通过选择性地合并有用上下文位置的特征来构建注意力上下文特征。我们通过将逐像素上下文注意力机制嵌入到池化和卷积操作中,同时关注全局或局部上下文,提出了PiCANet的三种形式。所有这三种模型都是完全可微的,并且可以与卷积神经网络集成进行联合训练。在这项工作中,我们将所提出的PiCANet引入到一个用于显著目标检测的U-Net模型中。生成的全局和局部注意力图能够学习纳入全局对比度和区域平滑度,这有助于更准确、更均匀地定位和突出显著目标。实验结果表明,所提出的PiCANet在针对现有最先进方法的显著性检测中表现有效。此外,我们还展示了PiCANet在语义分割和目标检测方面的有效性和泛化能力,其性能得到了提升。