School of Systems Engineering, National University of Defense Technology, Changsha 410073, China.
Comput Intell Neurosci. 2021 Nov 10;2021:5790608. doi: 10.1155/2021/5790608. eCollection 2021.
In this work, we introduce AdaCN, a novel adaptive cubic Newton method for nonconvex stochastic optimization. AdaCN dynamically captures the curvature of the loss landscape by diagonally approximated Hessian plus the norm of difference between previous two estimates. It only requires at most first order gradients and updates with linear complexity for both time and memory. In order to reduce the variance introduced by the stochastic nature of the problem, AdaCN hires the first and second moment to implement and exponential moving average on iteratively updated stochastic gradients and approximated stochastic Hessians, respectively. We validate AdaCN in extensive experiments, showing that it outperforms other stochastic first order methods (including SGD, Adam, and AdaBound) and stochastic quasi-Newton method (i.e., Apollo), in terms of both convergence speed and generalization performance.
在这项工作中,我们引入了 AdaCN,这是一种用于非凸随机优化的新颖自适应立方牛顿方法。AdaCN 通过对角近似的海森矩阵和前两次估计之间的差的范数来动态捕捉损失曲面的曲率。它只需要最多一阶梯度,并以线性复杂度进行时间和内存更新。为了减少问题的随机性质引入的方差,AdaCN 使用一阶和二阶矩分别对迭代更新的随机梯度和近似随机海森矩阵进行实现和指数移动平均。我们在广泛的实验中验证了 AdaCN,表明它在收敛速度和泛化性能方面都优于其他随机一阶方法(包括 SGD、Adam 和 AdaBound)和随机拟牛顿方法(即 Apollo)。