Department of Computer Science, University of Toronto, Toronto M5S 3G4, Canada.
Neural Comput. 2010 Jun;22(6):1473-92. doi: 10.1162/neco.2010.01-09-953.
To allow the hidden units of a restricted Boltzmann machine to model the transformation between two successive images, Memisevic and Hinton (2007) introduced three-way multiplicative interactions that use the intensity of a pixel in the first image as a multiplicative gain on a learned, symmetric weight between a pixel in the second image and a hidden unit. This creates cubically many parameters, which form a three-dimensional interaction tensor. We describe a low-rank approximation to this interaction tensor that uses a sum of factors, each of which is a three-way outer product. This approximation allows efficient learning of transformations between larger image patches. Since each factor can be viewed as an image filter, the model as a whole learns optimal filter pairs for efficiently representing transformations. We demonstrate the learning of optimal filter pairs from various synthetic and real image sequences. We also show how learning about image transformations allows the model to perform a simple visual analogy task, and we show how a completely unsupervised network trained on transformations perceives multiple motions of transparent dot patterns in the same way as humans.
为了让受限玻尔兹曼机的隐藏单元能够模拟两幅连续图像之间的变换,Memisevic 和 Hinton(2007 年)引入了三向乘法交互作用,该作用使用第一幅图像中某个像素的强度作为第二幅图像中某个像素与隐藏单元之间的已学习对称权重的乘法增益。这会产生三次方数量的参数,这些参数构成了三维交互张量。我们描述了这种交互张量的低秩逼近方法,该方法使用因子之和,每个因子都是三向外积。这种逼近方法允许在更大的图像块之间进行高效的变换学习。由于每个因子都可以看作是图像滤波器,因此整个模型学习最佳滤波器对以有效表示变换。我们展示了从各种合成和真实图像序列中学习最佳滤波器对的方法。我们还展示了学习图像变换如何使模型能够执行简单的视觉类比任务,并且我们展示了在变换上进行完全无监督训练的网络如何以与人类相同的方式感知透明点模式的多种运动。