• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

NALA:一种用于深度学习的Nesterov加速前瞻优化器。

NALA: a Nesterov accelerated look-ahead optimizer for deep learning.

作者信息

Zuo Xuan, Li Hui-Yan, Gao Shan, Zhang Pu, Du Wan-Ru

机构信息

School of Automation, Northwestern Polytechnical University, Xi'an, Shaanxi, China.

China Academy of Aerospace Systems Science and Engineering, Beijing, China.

出版信息

PeerJ Comput Sci. 2024 Jul 3;10:e2167. doi: 10.7717/peerj-cs.2167. eCollection 2024.

DOI:10.7717/peerj-cs.2167
PMID:38983239
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11232586/
Abstract

Adaptive gradient algorithms have been successfully used in deep learning. Previous work reveals that adaptive gradient algorithms mainly borrow the moving average idea of heavy ball acceleration to estimate the first- and second-order moments of the gradient for accelerating convergence. However, Nesterov acceleration which uses the gradient at extrapolation point can achieve a faster convergence speed than heavy ball acceleration in theory. In this article, a new optimization algorithm which combines adaptive gradient algorithm with Nesterov acceleration by using a look-ahead scheme, called NALA, is proposed for deep learning. NALA iteratively updates two sets of weights, , the 'fast weights' in its inner loop and the 'slow weights' in its outer loop. Concretely, NALA first updates the fast weights times using Adam optimizer in the inner loop, and then updates the slow weights once in the direction of Nesterov's Accelerated Gradient (NAG) in the outer loop. We compare NALA with several popular optimization algorithms on a range of image classification tasks on public datasets. The experimental results show that NALA can achieve faster convergence and higher accuracy than other popular optimization algorithms.

摘要

自适应梯度算法已成功应用于深度学习。先前的工作表明,自适应梯度算法主要借鉴重球加速的移动平均思想来估计梯度的一阶和二阶矩,以加速收敛。然而,在理论上,使用外推点梯度的Nesterov加速比重球加速能实现更快的收敛速度。在本文中,提出了一种新的优化算法,通过使用前瞻方案将自适应梯度算法与Nesterov加速相结合,用于深度学习,称为NALA。NALA迭代更新两组权重,即其内环中的“快速权重”和外环中的“慢速权重”。具体来说,NALA首先在内环中使用Adam优化器更新快速权重 次,然后在外环中沿Nesterov加速梯度(NAG)方向更新一次慢速权重。我们在公共数据集上的一系列图像分类任务中将NALA与几种流行的优化算法进行了比较。实验结果表明,NALA比其他流行的优化算法能实现更快的收敛和更高的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/62acac9dd396/peerj-cs-10-2167-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/919271ec0d90/peerj-cs-10-2167-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/8f5e3447dffa/peerj-cs-10-2167-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/b41abbb02b3d/peerj-cs-10-2167-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/62acac9dd396/peerj-cs-10-2167-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/919271ec0d90/peerj-cs-10-2167-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/8f5e3447dffa/peerj-cs-10-2167-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/b41abbb02b3d/peerj-cs-10-2167-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0fb8/11232586/62acac9dd396/peerj-cs-10-2167-g004.jpg

相似文献

1
NALA: a Nesterov accelerated look-ahead optimizer for deep learning.NALA:一种用于深度学习的Nesterov加速前瞻优化器。
PeerJ Comput Sci. 2024 Jul 3;10:e2167. doi: 10.7717/peerj-cs.2167. eCollection 2024.
2
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.阿丹:用于更快优化深度模型的自适应涅斯捷罗夫动量算法。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):9508-9520. doi: 10.1109/TPAMI.2024.3423382. Epub 2024 Nov 6.
3
The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization.涅斯捷罗夫外推法在非光滑优化个体收敛中的强度
IEEE Trans Neural Netw Learn Syst. 2020 Jul;31(7):2557-2568. doi: 10.1109/TNNLS.2019.2933452. Epub 2019 Sep 2.
4
Nesterov-accelerated adaptive momentum estimation-based wavefront distortion correction algorithm.基于涅斯捷罗夫加速自适应动量估计的波前畸变校正算法
Appl Opt. 2021 Aug 20;60(24):7177-7185. doi: 10.1364/AO.428465.
5
Improving the efficiency of RMSProp optimizer by utilizing Nestrove in deep learning.利用 Nestrove 提高深度学习中 RMSProp 优化器的效率。
Sci Rep. 2023 May 31;13(1):8814. doi: 10.1038/s41598-023-35663-x.
6
Adaptive Restart of the Optimized Gradient Method for Convex Optimization.用于凸优化的优化梯度法的自适应重启
J Optim Theory Appl. 2018 Jul;178(1):240-263. doi: 10.1007/s10957-018-1287-4. Epub 2018 May 7.
7
A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration.结合加权聚合与动量加速的AdaGrad统一分析
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14482-14490. doi: 10.1109/TNNLS.2023.3279381. Epub 2024 Oct 7.
8
Selecting the best optimizers for deep learning-based medical image segmentation.为基于深度学习的医学图像分割选择最佳优化器。
Front Radiol. 2023 Sep 21;3:1175473. doi: 10.3389/fradi.2023.1175473. eCollection 2023.
9
Optimized first-order methods for smooth convex minimization.用于光滑凸最小化的优化一阶方法。
Math Program. 2016 Sep;159(1):81-107. doi: 10.1007/s10107-015-0949-3. Epub 2015 Oct 17.
10
Accelerated statistical reconstruction for C-arm cone-beam CT using Nesterov's method.使用涅斯捷罗夫方法的C形臂锥束CT加速统计重建
Med Phys. 2015 May;42(5):2699-708. doi: 10.1118/1.4914378.

本文引用的文献

1
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.阿丹:用于更快优化深度模型的自适应涅斯捷罗夫动量算法。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):9508-9520. doi: 10.1109/TPAMI.2024.3423382. Epub 2024 Nov 6.