Geng Qichuan, Zhang Hong, Qi Xiaojuan, Huang Gao, Yang Ruigang, Zhou Zhong
IEEE Trans Image Process. 2021;30:2436-2449. doi: 10.1109/TIP.2020.3046921. Epub 2021 Feb 1.
Semantic segmentation is a challenging task that needs to handle large scale variations, deformations, and different viewpoints. In this paper, we develop a novel network named Gated Path Selection Network (GPSNet), which aims to adaptively select receptive fields while maintaining the dense sampling capability. In GPSNet, we first design a two-dimensional SuperNet, which densely incorporates features from growing receptive fields. And then, a Comparative Feature Aggregation (CFA) module is introduced to dynamically aggregate discriminative semantic context. In contrast to previous works that focus on optimizing sparse sampling locations on regular grids, GPSNet can adaptively harvest free form dense semantic context information. The derived adaptive receptive fields and dense sampling locations are data-dependent and flexible which can model various contexts of objects. On two representative semantic segmentation datasets, i.e., Cityscapes and ADE20K, we show that the proposed approach consistently outperforms previous methods without bells and whistles.
语义分割是一项具有挑战性的任务,需要处理大规模变化、变形和不同视角。在本文中,我们开发了一种名为门控路径选择网络(GPSNet)的新型网络,其目的是在保持密集采样能力的同时自适应地选择感受野。在GPSNet中,我们首先设计了一个二维超网络,它密集地融合了来自不断增长的感受野的特征。然后,引入了一个对比特征聚合(CFA)模块来动态聚合有判别力的语义上下文。与之前专注于在规则网格上优化稀疏采样位置的工作不同,GPSNet可以自适应地获取自由形式的密集语义上下文信息。导出的自适应感受野和密集采样位置是数据依赖且灵活的,能够对物体的各种上下文进行建模。在两个具有代表性的语义分割数据集,即Cityscapes和ADE20K上,我们表明所提出的方法始终优于没有额外花里胡哨功能的先前方法。