NALA：一种用于深度学习的Nesterov加速前瞻优化器。

NALA: a Nesterov accelerated look-ahead optimizer for deep learning.

作者信息

Zuo Xuan, Li Hui-Yan, Gao Shan, Zhang Pu, Du Wan-Ru

机构信息

School of Automation, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

China Academy of Aerospace Systems Science and Engineering, Beijing, China.

出版信息

PeerJ Comput Sci. 2024 Jul 3;10:e2167. doi: 10.7717/peerj-cs.2167. eCollection 2024.

DOI:10.7717/peerj-cs.2167

PMID:38983239

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11232586/

Abstract

Adaptive gradient algorithms have been successfully used in deep learning. Previous work reveals that adaptive gradient algorithms mainly borrow the moving average idea of heavy ball acceleration to estimate the first- and second-order moments of the gradient for accelerating convergence. However, Nesterov acceleration which uses the gradient at extrapolation point can achieve a faster convergence speed than heavy ball acceleration in theory. In this article, a new optimization algorithm which combines adaptive gradient algorithm with Nesterov acceleration by using a look-ahead scheme, called NALA, is proposed for deep learning. NALA iteratively updates two sets of weights, , the 'fast weights' in its inner loop and the 'slow weights' in its outer loop. Concretely, NALA first updates the fast weights times using Adam optimizer in the inner loop, and then updates the slow weights once in the direction of Nesterov's Accelerated Gradient (NAG) in the outer loop. We compare NALA with several popular optimization algorithms on a range of image classification tasks on public datasets. The experimental results show that NALA can achieve faster convergence and higher accuracy than other popular optimization algorithms.

摘要

自适应梯度算法已成功应用于深度学习。先前的工作表明，自适应梯度算法主要借鉴重球加速的移动平均思想来估计梯度的一阶和二阶矩，以加速收敛。然而，在理论上，使用外推点梯度的Nesterov加速比重球加速能实现更快的收敛速度。在本文中，提出了一种新的优化算法，通过使用前瞻方案将自适应梯度算法与Nesterov加速相结合，用于深度学习，称为NALA。NALA迭代更新两组权重，即其内环中的“快速权重”和外环中的“慢速权重”。具体来说，NALA首先在内环中使用Adam优化器更新快速权重次，然后在外环中沿Nesterov加速梯度（NAG）方向更新一次慢速权重。我们在公共数据集上的一系列图像分类任务中将NALA与几种流行的优化算法进行了比较。实验结果表明，NALA比其他流行的优化算法能实现更快的收敛和更高的准确率。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

NALA：一种用于深度学习的Nesterov加速前瞻优化器。

NALA: a Nesterov accelerated look-ahead optimizer for deep learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

NALA：一种用于深度学习的Nesterov加速前瞻优化器。

NALA: a Nesterov accelerated look-ahead optimizer for deep learning.

作者信息

机构信息

出版信息

相似文献

本文引用的文献