Analytics Department, ID Analytics San Diego, CA, USA.
Statistical and Visual Computing Lab, Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA.
Front Comput Neurosci. 2014 Sep 9;8:109. doi: 10.3389/fncom.2014.00109. eCollection 2014.
The benefits of integrating attention and object recognition are investigated. While attention is frequently modeled as a pre-processor for recognition, we investigate the hypothesis that attention is an intrinsic component of recognition and vice-versa. This hypothesis is tested with a recognition model, the hierarchical discriminant saliency network (HDSN), whose layers are top-down saliency detectors, tuned for a visual class according to the principles of discriminant saliency. As a model of neural computation, the HDSN has two possible implementations. In a biologically plausible implementation, all layers comply with the standard neurophysiological model of visual cortex, with sub-layers of simple and complex units that implement a combination of filtering, divisive normalization, pooling, and non-linearities. In a convolutional neural network implementation, all layers are convolutional and implement a combination of filtering, rectification, and pooling. The rectification is performed with a parametric extension of the now popular rectified linear units (ReLUs), whose parameters can be tuned for the detection of target object classes. This enables a number of functional enhancements over neural network models that lack a connection to saliency, including optimal feature denoising mechanisms for recognition, modulation of saliency responses by the discriminant power of the underlying features, and the ability to detect both feature presence and absence. In either implementation, each layer has a precise statistical interpretation, and all parameters are tuned by statistical learning. Each saliency detection layer learns more discriminant saliency templates than its predecessors and higher layers have larger pooling fields. This enables the HDSN to simultaneously achieve high selectivity to target object classes and invariance. The performance of the network in saliency and object recognition tasks is compared to those of models from the biological and computer vision literatures. This demonstrates benefits for all the functional enhancements of the HDSN, the class tuning inherent to discriminant saliency, and saliency layers based on templates of increasing target selectivity and invariance. Altogether, these experiments suggest that there are non-trivial benefits in integrating attention and recognition.
研究了注意力和目标识别集成的好处。虽然注意力通常被建模为识别的预处理,但我们研究了这样一种假设,即注意力是识别的内在组成部分,反之亦然。使用识别模型——分层判别显着性网络(HDSN)来检验这一假设,该模型的层是自上而下的显着性探测器,根据判别显着性的原则,根据视觉类别进行调整。作为一种神经计算模型,HDSN 有两种可能的实现方式。在一种具有生物学意义的实现方式中,所有层都符合视觉皮层的标准神经生理学模型,具有简单和复杂单元的子层,实现了滤波、除法归一化、池化和非线性的组合。在卷积神经网络实现中,所有层都是卷积的,并实现了滤波、整流和池化的组合。整流是通过对现在流行的整流线性单元(ReLU)的参数扩展来实现的,其参数可以针对目标对象类的检测进行调整。这使得具有与显着性无关的连接的神经网络模型具有许多功能增强,包括用于识别的最佳特征去噪机制、基础特征的判别能力对显着性响应的调制,以及检测特征存在和不存在的能力。在任何实现方式中,每层都有一个精确的统计解释,并且所有参数都通过统计学习进行调整。每个显着性检测层都比其前一层学习更多的判别显着性模板,并且更高的层具有更大的池化区域。这使得 HDSN 能够同时实现对目标对象类的高选择性和不变性。将网络在显着性和目标识别任务中的性能与生物和计算机视觉文献中的模型进行了比较。这证明了 HDSN 的所有功能增强、判别显着性固有的类调谐以及基于目标选择性和不变性模板的显着性层都有好处。总之,这些实验表明,在注意力和识别的集成中有非平凡的好处。