Mézard Marc
Physics Department, Ecole Normale Supérieure, PSL Research University, Paris.
Phys Rev E. 2017 Feb;95(2-1):022117. doi: 10.1103/PhysRevE.95.022117. Epub 2017 Feb 14.
Motivated by recent progress in using restricted Boltzmann machines as preprocessing algorithms for deep neural network, we revisit the mean-field equations [belief-propagation and Thouless-Anderson Palmer (TAP) equations] in the best understood of such machines, namely the Hopfield model of neural networks, and we explicit how they can be used as iterative message-passing algorithms, providing a fast method to compute the local polarizations of neurons. In the "retrieval phase", where neurons polarize in the direction of one memorized pattern, we point out a major difference between the belief propagation and TAP equations: The set of belief propagation equations depends on the pattern which is retrieved, while one can use a unique set of TAP equations. This makes the latter method much better suited for applications in the learning process of restricted Boltzmann machines. In the case where the patterns memorized in the Hopfield model are not independent, but are correlated through a combinatorial structure, we show that the TAP equations have to be modified. This modification can be seen either as an alteration of the reaction term in TAP equations or, more interestingly, as the consequence of message passing on a graphical model with several hidden layers, where the number of hidden layers depends on the depth of the correlations in the memorized patterns. This layered structure is actually necessary when one deals with more general restricted Boltzmann machines.
受近期将受限玻尔兹曼机用作深度神经网络预处理算法所取得进展的启发,我们重新审视了这类机器中理解最为透彻的平均场方程(信念传播方程和 Thouless-Anderson Palmer(TAP)方程),即神经网络的 Hopfield 模型,并明确了它们如何能够用作迭代消息传递算法,从而提供一种计算神经元局部极化的快速方法。在“检索阶段”,即神经元沿一个记忆模式的方向极化时,我们指出信念传播方程和 TAP 方程之间的一个主要差异:信念传播方程组取决于所检索的模式,而对于 TAP 方程,可以使用唯一的一组方程。这使得后一种方法更适合在受限玻尔兹曼机的学习过程中应用。在 Hopfield 模型中所记忆的模式不是独立的而是通过组合结构相关联的情况下,我们表明必须对 TAP 方程进行修改。这种修改既可以看作是 TAP 方程中反应项的改变,或者更有趣的是,看作是在具有多个隐藏层的图形模型上进行消息传递的结果,其中隐藏层的数量取决于所记忆模式中相关性的深度。当处理更一般的受限玻尔兹曼机时,这种分层结构实际上是必要的。