Suppr
超能文献

关于梯度下降学习算法中的动量项。

On the momentum term in gradient descent learning algorithms.

作者信息

Qian Ning

机构信息

Center for Neurobiology and Behavior, Columbia University, 722 W. 168th Street, New York, USA

出版信息

Neural Netw. 1999 Jan;12(1):145-151. doi: 10.1016/s0893-6080(98)00116-6.

DOI:10.1016/s0893-6080(98)00116-6

PMID:12662723

Abstract

A momentum term is usually included in the simulations of connectionist learning algorithms. Although it is well known that such a term greatly improves the speed of learning, there have been few rigorous studies of its mechanisms. In this paper, I show that in the limit of continuous time, the momentum parameter is analogous to the mass of Newtonian particles that move through a viscous medium in a conservative force field. The behavior of the system near a local minimum is equivalent to a set of coupled and damped harmonic oscillators. The momentum term improves the speed of convergence by bringing some eigen components of the system closer to critical damping. Similar results can be obtained for the discrete time case used in computer simulations. In particular, I derive the bounds for convergence on learning-rate and momentum parameters, and demonstrate that the momentum term can increase the range of learning rate over which the system converges. The optimal condition for convergence is also analyzed.

摘要

在联结主义学习算法的模拟中通常会包含一个动量项。尽管众所周知这样一个项能极大地提高学习速度，但对其机制却鲜有严格的研究。在本文中，我表明在连续时间的极限情况下，动量参数类似于在保守力场中通过粘性介质运动的牛顿粒子的质量。系统在局部最小值附近的行为等同于一组耦合且有阻尼的简谐振荡器。动量项通过使系统的一些本征分量更接近临界阻尼来提高收敛速度。对于计算机模拟中使用的离散时间情况也能得到类似结果。特别地，我推导了学习率和动量参数的收敛边界，并证明动量项可以增加系统收敛的学习率范围。还分析了收敛的最优条件。