IEEE Trans Pattern Anal Mach Intell. 2020 May;42(5):1228-1242. doi: 10.1109/TPAMI.2019.2893630. Epub 2019 Jan 18.
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems.
最近,非常深的卷积神经网络(CNNs)在目标识别方面表现出色,并且也成为密集预测问题(如语义分割和深度估计)的首选。然而,在深层 CNN 中,像池化或卷积步长这样的重复下采样操作会导致初始图像分辨率显著降低。在这里,我们提出了 RefineNet,这是一种通用的多路径细化网络,它明确利用沿下采样过程中提供的所有信息,通过长程残差连接实现高分辨率预测。通过这种方式,可以使用早期卷积的细粒度特征直接细化捕获高级语义特征的更深层。RefineNet 的各个组件都采用了残差连接,遵循恒等映射的思路,这使得端到端的训练变得更加有效。此外,我们引入了链式残差池化(chained residual pooling),它以有效的方式捕获丰富的背景上下文。我们在语义分割方面进行了全面的实验,这是一个密集分类问题,并在七个公共数据集上取得了良好的性能。我们进一步将我们的方法应用于深度估计,并在密集回归问题上展示了我们方法的有效性。