Qiao Kai, Chen Jian, Wang Linyuan, Zeng Lei, Yan Bin
National Digital Switching System Engineering and Technological Research Centre, Zhengzhou, China.
PLoS One. 2017 Mar 24;12(3):e0174508. doi: 10.1371/journal.pone.0174508. eCollection 2017.
Given their powerful feature representation for recognition, deep convolutional neural networks (DCNNs) have been driving rapid advances in high-level computer vision tasks. However, their performance in semantic image segmentation is still not satisfactory. Based on the analysis of visual mechanism, we conclude that DCNNs in a bottom-up manner are not enough, because semantic image segmentation task requires not only recognition but also visual attention capability. In the study, superpixels containing visual attention information are introduced in a top-down manner, and an extensible architecture is proposed to improve the segmentation results of current DCNN-based methods. We employ the current state-of-the-art fully convolutional network (FCN) and FCN with conditional random field (DeepLab-CRF) as baselines to validate our architecture. Experimental results of the PASCAL VOC segmentation task qualitatively show that coarse edges and error segmentation results are well improved. We also quantitatively obtain about 2%-3% intersection over union (IOU) accuracy improvement on the PASCAL VOC 2011 and 2012 test sets.
鉴于深度卷积神经网络(DCNN)在识别方面具有强大的特征表示能力,它们推动了高级计算机视觉任务的快速发展。然而,其在语义图像分割方面的性能仍不尽人意。基于对视觉机制的分析,我们得出结论,自下而上的DCNN方式是不够的,因为语义图像分割任务不仅需要识别能力,还需要视觉注意力能力。在本研究中,以自上而下的方式引入了包含视觉注意力信息的超像素,并提出了一种可扩展的架构,以改善当前基于DCNN方法的分割结果。我们采用当前最先进的全卷积网络(FCN)以及带有条件随机场的FCN(DeepLab-CRF)作为基线来验证我们的架构。PASCAL VOC分割任务的实验结果定性地表明,粗糙边缘和错误分割结果得到了很好的改善。我们还在PASCAL VOC 2011和2012测试集上定量地获得了约2%-3%的交并比(IOU)精度提升。