Li Guanbin, Yu Yizhou
IEEE Trans Neural Netw Learn Syst. 2018 Dec;29(12):6038-6051. doi: 10.1109/TNNLS.2018.2817540. Epub 2018 Apr 12.
Deep convolutional neural networks (CNNs) have become a key element in the recent breakthrough of salient object detection. However, existing CNN-based methods are based on either patchwise (regionwise) training and inference or fully convolutional networks. Methods in the former category are generally time-consuming due to severe storage and computational redundancies among overlapping patches. To overcome this deficiency, methods in the second category attempt to directly map a raw input image to a predicted dense saliency map in a single network forward pass. Though being very efficient, it is arduous for these methods to detect salient objects of different scales or salient regions with weak semantic information. In this paper, we develop hybrid contrast-oriented deep neural networks to overcome the aforementioned limitations. Each of our deep networks is composed of two complementary components, including a fully convolutional stream for dense prediction and a segment-level spatial pooling stream for sparse saliency inference. We further propose an attentional module that learns weight maps for fusing the two saliency predictions from these two streams. A tailored alternate scheme is designed to train these deep networks by fine-tuning pretrained baseline models. Finally, a customized fully connected conditional random field model incorporating a salient contour feature embedding can be optionally applied as a postprocessing step to improve spatial coherence and contour positioning in the fused result from these two streams. Extensive experiments on six benchmark data sets demonstrate that our proposed model can significantly outperform the state of the art in terms of all popular evaluation metrics.
深度卷积神经网络(CNN)已成为近期显著目标检测取得突破的关键要素。然而,现有的基于CNN的方法要么基于逐块(区域)训练和推理,要么基于全卷积网络。前一类方法由于重叠块之间存在严重的存储和计算冗余,通常耗时较长。为了克服这一缺陷,后一类方法试图在单次网络前向传播中将原始输入图像直接映射到预测的密集显著图。尽管这些方法效率很高,但对于检测不同尺度的显著目标或语义信息较弱的显著区域来说却很困难。在本文中,我们开发了面向混合对比度的深度神经网络来克服上述局限性。我们的每个深度网络都由两个互补组件组成,包括用于密集预测的全卷积流和用于稀疏显著推理的段级空间池化流。我们进一步提出了一个注意力模块,该模块学习权重图以融合来自这两个流的两个显著预测。设计了一种定制的交替方案,通过微调预训练的基线模型来训练这些深度网络。最后,可以选择应用一个包含显著轮廓特征嵌入的定制全连接条件随机场模型作为后处理步骤,以改善这两个流融合结果中的空间连贯性和轮廓定位。在六个基准数据集上进行的大量实验表明,我们提出的模型在所有流行的评估指标方面都能显著优于现有技术。