Bengio Yoshua, Mesnard Thomas, Fischer Asja, Zhang Saizheng, Wu Yuhuai
Montreal Institute for Learning Algorithms, University of Montreal, Montreal H3T 1J4, Quebec, Canada, and Canadian Institute for Advanced Research
Computer Science Department, École Normale Supérieure, Paris 75005, France
Neural Comput. 2017 Mar;29(3):555-577. doi: 10.1162/NECO_a_00934. Epub 2017 Jan 17.
We show that Langevin Markov chain Monte Carlo inference in an energy-based model with latent variables has the property that the early steps of inference, starting from a stationary point, correspond to propagating error gradients into internal layers, similar to backpropagation. The backpropagated error is with respect to output units that have received an outside driving force pushing them away from the stationary point. Backpropagated error gradients correspond to temporal derivatives with respect to the activation of hidden units. These lead to a weight update proportional to the product of the presynaptic firing rate and the temporal rate of change of the postsynaptic firing rate. Simulations and a theoretical argument suggest that this rate-based update rule is consistent with those associated with spike-timing-dependent plasticity. The ideas presented in this article could be an element of a theory for explaining how brains perform credit assignment in deep hierarchies as efficiently as backpropagation does, with neural computation corresponding to both approximate inference in continuous-valued latent variables and error backpropagation, at the same time.
我们表明,在具有潜在变量的基于能量的模型中,朗之万马尔可夫链蒙特卡罗推理具有这样的特性:从一个稳定点开始的推理早期步骤,类似于反向传播,对应于将误差梯度传播到内部层。反向传播的误差是相对于那些接收到外部驱动力从而使其远离稳定点的输出单元而言的。反向传播的误差梯度对应于关于隐藏单元激活的时间导数。这些导致权重更新与突触前发放率和突触后发放率的时间变化率的乘积成比例。模拟和理论论证表明,这种基于速率的更新规则与那些与尖峰时间依赖可塑性相关的规则是一致的。本文提出的观点可能是一种理论的要素,该理论用于解释大脑如何像反向传播一样高效地在深度层次结构中进行信用分配,其中神经计算同时对应于连续值潜在变量中的近似推理和误差反向传播。