IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):330-339. doi: 10.1109/TNNLS.2020.3027750. Epub 2022 Jan 5.
Optimization in a deep neural network is always challenging due to the vanishing gradient problem and intensive fine-tuning of network hyperparameters. Inspired by multistage decision control systems, the stochastic diagonal approximate greatest descent (SDAGD) algorithm is proposed in this article to seek for optimal learning weights using a two-phase switching optimization strategy. The proposed optimizer controls the relative step length derived based on the long-term optimal trajectory and adopts the diagonal approximated Hessian for efficient weight update. In Phase-I, it computes the greatest step length at the boundary of each local spherical search region and, subsequently, descends rapidly toward the direction of an optimal solution. In Phase-II, it switches to an approximate Newton method automatically once it is closer to the optimal solution to achieve fast convergence. The experiments show that SDAGD produces steeper learning curves and achieves lower misclassification rates compared with other optimization techniques. Implementation of the proposed optimizer to deeper networks is also investigated in this article to study the vanishing gradient problem.
由于梯度消失问题和网络超参数的精细调整,深度神经网络的优化一直具有挑战性。受多阶段决策控制系统的启发,本文提出了随机对角近似最大下降(SDAGD)算法,使用两阶段切换优化策略来寻找最优学习权重。所提出的优化器基于长期最优轨迹控制所导出的相对步长,并采用对角近似海森矩阵进行有效的权重更新。在第一阶段,它在每个局部球形搜索区域的边界处计算最大步长,然后快速向最优解的方向下降。在第二阶段,一旦接近最优解,它会自动切换到近似牛顿法以实现快速收敛。实验表明,与其他优化技术相比,SDAGD 产生了更陡峭的学习曲线,并实现了更低的错误分类率。本文还研究了将所提出的优化器应用于更深的网络,以研究梯度消失问题。