Yu X H, Chen G A, Cheng S X
Dept. of Radio Eng., Southeast Univ., Nanjing.
IEEE Trans Neural Netw. 1995;6(3):669-77. doi: 10.1109/72.377972.
It has been observed by many authors that the backpropagation (BP) error surfaces usually consist of a large amount of flat regions as well as extremely steep regions. As such, the BP algorithm with a fixed learning rate will have low efficiency. This paper considers dynamic learning rate optimization of the BP algorithm using derivative information. An efficient method of deriving the first and second derivatives of the objective function with respect to the learning rate is explored, which does not involve explicit calculation of second-order derivatives in weight space, but rather uses the information gathered from the forward and backward propagation, Several learning rate optimization approaches are subsequently established based on linear expansion of the actual outputs and line searches with acceptable descent value and Newton-like methods, respectively. Simultaneous determination of the optimal learning rate and momentum is also introduced by showing the equivalence between the momentum version BP and the conjugate gradient method. Since these approaches are constructed by simple manipulations of the obtained derivatives, the computational and storage burden scale with the network size exactly like the standard BP algorithm, and the convergence of the BP algorithm is accelerated with in a remarkable reduction (typically by factor 10 to 50, depending upon network architectures and applications) in the running time for the overall learning process. Numerous computer simulation results are provided to support the present approaches.
许多作者已经观察到,反向传播(BP)误差曲面通常由大量的平坦区域以及极其陡峭的区域组成。因此,固定学习率的BP算法效率会很低。本文考虑使用导数信息对BP算法进行动态学习率优化。探索了一种有效推导目标函数关于学习率的一阶和二阶导数的方法,该方法不涉及在权重空间中显式计算二阶导数,而是利用前向和反向传播收集的信息。随后分别基于实际输出的线性展开、具有可接受下降值的线搜索和类牛顿方法建立了几种学习率优化方法。通过展示动量版BP与共轭梯度法之间的等价性,还引入了同时确定最优学习率和动量的方法。由于这些方法是通过对所得导数进行简单操作构建的,计算和存储负担与网络大小的比例关系与标准BP算法完全相同,并且BP算法的收敛速度在整体学习过程的运行时间上有显著降低(通常降低10到50倍,具体取决于网络架构和应用)。提供了大量计算机模拟结果来支持当前方法。