IEEE Trans Image Process. 2020;29(1):2066-2077. doi: 10.1109/TIP.2019.2941644. Epub 2019 Oct 22.
Semantic image segmentation is an important yet unsolved problem. One of the major challenges is the large variability of the object scales. To tackle this scale problem, we propose a Scale-Adaptive Network (SAN) which consists of multiple branches with each one taking charge of the segmentation of the objects of a certain range of scales. Given an image, SAN first computes a dense scale map indicating the scale of each pixel which is automatically determined by the size of the enclosing object. Then the features of different branches are fused according to the scale map to generate the final segmentation map. To ensure that each branch indeed learns the features for a certain scale, we propose a scale-induced ground-truth map and enforce a scale-aware segmentation loss for the corresponding branch in addition to the final loss. Extensive experiments over the PASCAL-Person-Part, the PASCAL VOC 2012, and the Look into Person datasets demonstrate that our SAN can handle the large variability of the object scales and outperforms the state-of-the-art semantic segmentation methods.
语义图像分割是一个重要但尚未解决的问题。其中一个主要挑战是目标尺度的巨大可变性。为了解决这个尺度问题,我们提出了一种 Scale-Adaptive Network(SAN),它由多个分支组成,每个分支负责分割一定范围尺度的目标。给定一张图像,SAN 首先计算一个密集的尺度图,指示每个像素的尺度,该尺度由包围对象的大小自动确定。然后根据尺度图融合不同分支的特征,生成最终的分割图。为了确保每个分支确实学习了特定尺度的特征,我们提出了一种尺度诱导的真实标签图,并在最终损失之外,为相应的分支施加一种尺度感知的分割损失。在 PASCAL-Person-Part、PASCAL VOC 2012 和 Look into Person 数据集上的广泛实验表明,我们的 SAN 可以处理目标尺度的巨大可变性,并且优于最新的语义分割方法。