IEEE Trans Pattern Anal Mach Intell. 2023 Jun;45(6):6881-6895. doi: 10.1109/TPAMI.2020.3047209. Epub 2023 May 5.
The non-local network (NLNet) presents a pioneering approach for capturing long-range dependencies within an image, via aggregating query-specific global context to each query position. However, through a rigorous empirical analysis, we have found that the global contexts modeled by the non-local network are almost the same for different query positions. In this paper, we take advantage of this finding to create a simplified network based on a query-independent formulation, which maintains the accuracy of NLNet but with significantly less computation. We further replace the one-layer transformation function of the non-local block by a two-layer bottleneck, which further reduces the parameter number considerably. The resulting network element, called the global context (GC) block, effectively models global context in a lightweight manner, allowing it to be applied at multiple layers of a backbone network to form a global context network (GCNet). Experiments show that GCNet generally outperforms NLNet on major benchmarks for various recognition tasks. The code and network configurations are available at https://github.com/xvjiarui/GCNet.
非局部网络(NLNet)提出了一种开创性的方法,可以通过聚合每个查询位置的特定于查询的全局上下文来捕获图像中的长程依赖关系。然而,通过严格的实证分析,我们发现非局部网络建模的全局上下文对于不同的查询位置几乎是相同的。在本文中,我们利用这一发现创建了一个基于查询独立公式的简化网络,该网络保持了 NLNet 的准确性,但计算量大大减少。我们进一步将非局部块的单层变换函数替换为两层瓶颈,这进一步大大减少了参数数量。由此产生的网络元素,称为全局上下文(GC)块,以轻量级的方式有效地建模全局上下文,允许它在骨干网络的多个层中应用,以形成全局上下文网络(GCNet)。实验表明,GCNet 在各种识别任务的主要基准上通常优于 NLNet。代码和网络配置可在 https://github.com/xvjiarui/GCNet 上获得。