He Xingjian, Liu Jing, Wang Weining, Lu Hanqing
IEEE Trans Image Process. 2022;31:2850-2863. doi: 10.1109/TIP.2022.3162101. Epub 2022 Apr 5.
Self-attention is widely explored to model long-range dependencies in semantic segmentation. However, this operation computes pair-wise relationships between the query point and all other points, leading to prohibitive complexity. In this paper, we propose an efficient Sampling-based Attention Network which combines a novel sample method with an attention mechanism for semantic segmentation. Specifically, we design a Stochastic Sampling-based Attention Module (SSAM) to capture the relationships between the query point and a stochastic sampled representative subset from a global perspective, where the sampled subset is selected by a Stochastic Sampling Module. Compared to self-attention, our SSAM achieves comparable segmentation performance while significantly reducing computational redundancy. In addition, with the observation that not all pixels are interested in the contextual information, we design a Deterministic Sampling-based Attention Module (DSAM) to sample features from a local region for obtaining the detailed information. Extensive experiments demonstrate that our proposed method can compete or perform favorably against the state-of-the-art methods on the Cityscapes, ADE20K, COCO Stuff, and PASCAL Context datasets.
自注意力机制在语义分割中被广泛用于对长距离依赖关系进行建模。然而,这种操作会计算查询点与所有其他点之间的成对关系,导致计算复杂度过高。在本文中,我们提出了一种高效的基于采样的注意力网络,该网络将一种新颖的采样方法与用于语义分割的注意力机制相结合。具体来说,我们设计了一个基于随机采样的注意力模块(SSAM),从全局角度捕捉查询点与随机采样的代表性子集之间的关系,其中采样子集由随机采样模块选择。与自注意力机制相比,我们的SSAM在显著减少计算冗余的同时,实现了相当的分割性能。此外,鉴于并非所有像素都对上下文信息感兴趣,我们设计了一个基于确定性采样的注意力模块(DSAM),从局部区域采样特征以获取详细信息。大量实验表明,我们提出的方法在Cityscapes、ADE20K、COCO Stuff和PASCAL Context数据集上能够与现有最先进方法竞争或表现更优。