Department of Statistics, University of Toronto, Toronto, Ontario M5S 3G3, Canada.
Neural Comput. 2012 Aug;24(8):1967-2006. doi: 10.1162/NECO_a_00311. Epub 2012 Apr 17.
We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.
我们提出了一种新的学习算法,用于包含多层隐藏变量的玻尔兹曼机。使用变分逼近法估计与数据相关的统计信息,该逼近法倾向于关注单个模式,而使用持久马尔可夫链估计与数据无关的统计信息。使用两种非常不同的技术来估计进入对数似然梯度的两种统计信息类型,使得使用具有多个隐藏层和数百万个参数的玻尔兹曼机进行学习变得切实可行。通过使用逐层预训练阶段,可以更有效地进行学习,该阶段可以合理地初始化权重。预训练还允许通过单个自下而上的传递合理地初始化变分推理。我们在 MNIST 和 NORB 数据集上的结果表明,深度玻尔兹曼机学习手写数字和 3D 物体的非常好的生成模型。我们还表明,深度玻尔兹曼机发现的特征是初始化前馈神经网络隐藏层的一种非常有效的方法,然后可以对其进行有区别的微调。