IEEE Trans Neural Netw Learn Syst. 2017 Mar;28(3):690-703. doi: 10.1109/TNNLS.2016.2522428. Epub 2016 Feb 16.
Hierarchical neural networks have been shown to be effective in learning representative image features and recognizing object classes. However, most existing networks combine the low/middle level cues for classification without accounting for any spatial structures. For applications such as understanding a scene, how the visual cues are spatially distributed in an image becomes essential for successful analysis. This paper extends the framework of deep neural networks by accounting for the structural cues in the visual signals. In particular, two kinds of neural networks have been proposed. First, we develop a multitask deep convolutional network, which simultaneously detects the presence of the target and the geometric attributes (location and orientation) of the target with respect to the region of interest. Second, a recurrent neuron layer is adopted for structured visual detection. The recurrent neurons can deal with the spatial distribution of visible cues belonging to an object whose shape or structure is difficult to explicitly define. Both the networks are demonstrated by the practical task of detecting lane boundaries in traffic scenes. The multitask convolutional neural network provides auxiliary geometric information to help the subsequent modeling of the given lane structures. The recurrent neural network automatically detects lane boundaries, including those areas containing no marks, without any explicit prior knowledge or secondary modeling.
层次神经网络在学习有代表性的图像特征和识别目标类别方面已被证明是有效的。然而,大多数现有的网络在进行分类时组合了低/中级线索,而没有考虑任何空间结构。对于像理解场景这样的应用,视觉线索在图像中的空间分布对于成功的分析变得至关重要。本文通过考虑视觉信号中的结构线索来扩展深度神经网络的框架。具体来说,提出了两种神经网络。首先,我们开发了一个多任务深度卷积网络,该网络同时检测目标的存在以及目标相对于感兴趣区域的几何属性(位置和方向)。其次,采用递归神经元层进行结构化视觉检测。递归神经元可以处理属于难以明确定义形状或结构的对象的可见线索的空间分布。这两个网络都通过交通场景中检测车道边界的实际任务进行了演示。多任务卷积神经网络提供了辅助几何信息,有助于对给定的车道结构进行后续建模。递归神经网络可以自动检测车道边界,包括那些没有标记的区域,而无需任何显式的先验知识或二次建模。