Nagano Station, Japan Broadcasting Corporation, 210-2, Inaba, Nagano-City, 380-8502, Japan.
Neural Netw. 2011 Mar;24(2):148-58. doi: 10.1016/j.neunet.2010.10.004. Epub 2010 Oct 27.
We propose a two-stage learning method which implements occluded visual scene analysis into a generative model, a type of hierarchical neural network with bi-directional synaptic connections. Here, top-down connections simulate forward optics to generate predictions for sensory driven low-level representation, whereas bottom-up connections function to send the prediction error, the difference between the sensory based and the predicted low-level representation, to higher areas. The prediction error is then used to update the high-level representation to obtain better agreement with the visual scene. Although the actual forward optics is highly nonlinear and the accuracy of simulated forward optics is crucial for these types of models, the majority of previous studies have only investigated linear and simplified cases of forward optics. Here we take occluded vision as an example of nonlinear forward optics, where an object in front completely masks out the object behind. We propose a two-staged learning method inspired by the staged development of infant visual capacity. In the primary learning stage, a minimal set of object basis is acquired within a linear generative model using the conventional unsupervised learning scheme. In the secondary learning stage, an auxiliary multi-layer neural network is trained to acquire nonlinear forward optics by supervised learning. The important point is that the high-level representation of the linear generative model serves as the input and the sensory driven low-level representation provides the desired output. Numerical simulations show that occluded visual scene analysis can indeed be implemented by the proposed method. Furthermore, considering the format of input to the multi-layer network and analysis of hidden-layer units leads to the prediction that whole object representation of partially occluded objects, together with complex intermediate representation as a consequence of nonlinear transformation from non-occluded to occluded representation may exist in the low-level visual system of the brain.
我们提出了一种两阶段学习方法,将遮挡视觉场景分析纳入生成模型中,生成模型是一种具有双向突触连接的分层神经网络。在这里,自上而下的连接模拟前向光学,为感觉驱动的低级表示生成预测,而自下而上的连接则将预测误差(感觉基础和预测的低级表示之间的差异)发送到更高的区域。然后,预测误差用于更新高级表示,以更好地与视觉场景一致。尽管实际的前向光学高度非线性,并且模拟前向光学的准确性对于这些类型的模型至关重要,但大多数先前的研究仅研究了前向光学的线性和简化情况。在这里,我们以遮挡视觉为例,这是一种物体完全遮挡后面物体的非线性前向光学。我们提出了一种两阶段学习方法,灵感来自婴儿视觉能力的分阶段发展。在初级学习阶段,使用传统的无监督学习方案,在线性生成模型中获取最小的物体基集合。在二次学习阶段,通过监督学习训练辅助多层神经网络来获取非线性前向光学。重要的是,线性生成模型的高级表示作为输入,感觉驱动的低级表示提供所需的输出。数值模拟表明,所提出的方法确实可以实现遮挡视觉场景分析。此外,考虑到多层网络的输入格式和隐藏层单元的分析,预测表明,部分遮挡物体的整体物体表示以及非线性从非遮挡到遮挡表示的变换的复杂中间表示可能存在于大脑的低级视觉系统中。