Hinton Geoffrey
Department of Computer Science, University of Toronto.
Cogn Sci. 2014 Aug;38(6):1078-101. doi: 10.1111/cogs.12049. Epub 2013 Jun 25.
It is possible to learn multiple layers of non-linear features by backpropagating error derivatives through a feedforward neural network. This is a very effective learning procedure when there is a huge amount of labeled training data, but for many learning tasks very few labeled examples are available. In an effort to overcome the need for labeled data, several different generative models were developed that learned interesting features by modeling the higher order statistical structure of a set of input vectors. One of these generative models, the restricted Boltzmann machine (RBM), has no connections between its hidden units and this makes perceptual inference and learning much simpler. More significantly, after a layer of hidden features has been learned, the activities of these features can be used as training data for another RBM. By applying this idea recursively, it is possible to learn a deep hierarchy of progressively more complicated features without requiring any labeled data. This deep hierarchy can then be treated as a feedforward neural network which can be discriminatively fine-tuned using backpropagation. Using a stack of RBMs to initialize the weights of a feedforward neural network allows backpropagation to work effectively in much deeper networks and it leads to much better generalization. A stack of RBMs can also be used to initialize a deep Boltzmann machine that has many hidden layers. Combining this initialization method with a new method for fine-tuning the weights finally leads to the first efficient way of training Boltzmann machines with many hidden layers and millions of weights.
通过前馈神经网络反向传播误差导数来学习多层非线性特征是可行的。当有大量带标签的训练数据时,这是一种非常有效的学习方法,但对于许多学习任务来说,可用的带标签示例很少。为了克服对带标签数据的需求,人们开发了几种不同的生成模型,这些模型通过对一组输入向量的高阶统计结构进行建模来学习有趣的特征。其中一种生成模型,即受限玻尔兹曼机(RBM),其隐藏单元之间没有连接,这使得感知推理和学习变得更加简单。更重要的是,在学习了一层隐藏特征之后,这些特征的活动可以用作另一个RBM的训练数据。通过递归地应用这个想法,可以学习到一个越来越复杂的特征的深度层次结构,而无需任何带标签的数据。然后,这个深度层次结构可以被视为一个前馈神经网络,可以使用反向传播进行有区别的微调。使用一堆RBM来初始化前馈神经网络的权重,可以使反向传播在更深的网络中有效地工作,并导致更好的泛化。一堆RBM也可以用来初始化一个有许多隐藏层的深度玻尔兹曼机。将这种初始化方法与一种新的权重微调方法相结合,最终产生了第一种有效训练具有许多隐藏层和数百万权重的玻尔兹曼机的方法。