Guo Meng-Hao, Liu Zheng-Ning, Mu Tai-Jiang, Hu Shi-Min
IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):5436-5447. doi: 10.1109/TPAMI.2022.3211006. Epub 2023 Apr 3.
Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignores potential correlation between different samples. This article proposes a novel attention mechanism which we call external attention, based on two external, small, learnable, shared memories, which can be implemented easily by simply using two cascaded linear layers and two normalization layers; it conveniently replaces self-attention in existing popular architectures. External attention has linear complexity and implicitly considers the correlations between all data samples. We further incorporate the multi-head mechanism into external attention to provide an all-MLP architecture, external attention MLP (EAMLP), for image classification. Extensive experiments on image classification, object detection, semantic segmentation, instance segmentation, image generation, and point cloud analysis reveal that our method provides results comparable or superior to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
注意力机制,尤其是自注意力机制,在视觉任务的深度特征表示中发挥着越来越重要的作用。自注意力机制通过使用所有位置之间的成对亲和力计算特征的加权和来更新每个位置的特征,以捕捉单个样本内的长程依赖关系。然而,自注意力机制具有二次复杂度,并且忽略了不同样本之间的潜在相关性。本文基于两个外部的、小的、可学习的、共享的记忆体,提出了一种新颖的注意力机制,我们称之为外部注意力机制,它可以通过简单地使用两个级联的线性层和两个归一化层轻松实现;它可以方便地替代现有流行架构中的自注意力机制。外部注意力机制具有线性复杂度,并隐式地考虑了所有数据样本之间的相关性。我们进一步将多头机制纳入外部注意力机制,以提供一种用于图像分类的全多层感知器架构,即外部注意力多层感知器(EAMLP)。在图像分类、目标检测、语义分割、实例分割、图像生成和点云分析等方面的大量实验表明,我们的方法提供了与自注意力机制及其一些变体相当或更优的结果,同时计算和内存成本要低得多。