• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

校准自适应学习率以提高ADAM算法的收敛性。

Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM.

作者信息

Tong Qianqian, Liang Guannan, Bi Jinbo

机构信息

Computer Science and Engineering, University of Connecticut, Storrs, CT 06269.

出版信息

Neurocomputing (Amst). 2022 Apr 7;481:333-356. doi: 10.1016/j.neucom.2022.01.014. Epub 2022 Jan 21.

DOI:10.1016/j.neucom.2022.01.014
PMID:35342226
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8951388/
Abstract

Adaptive gradient methods (AGMs) have become popular in optimizing the nonconvex problems in deep learning area. We revisit AGMs and identify that the adaptive learning rate (A-LR) used by AGMs varies significantly across the dimensions of the problem over epochs (i.e., anisotropic scale), which may lead to issues in convergence and generalization. All existing modified AGMs actually represent efforts in revising the A-LR. Theoretically, we provide a new way to analyze the convergence of AGMs and prove that the convergence rate of Adam also depends on its hyper-parameter є, which has been overlooked previously. Based on these two facts, we propose a new AGM by calibrating the A-LR with an activation () function, resulting in the Sadam and SAMSGrad methods. We further prove that these algorithms enjoy better convergence speed under nonconvex, non-strongly convex, and Polyak-Łojasiewicz conditions compared with Adam. Empirical studies support our observation of the anisotropic A-LR and show that the proposed methods outperform existing AGMs and generalize even better than S-Momentum in multiple deep learning tasks.

摘要

自适应梯度方法(AGMs)在优化深度学习领域的非凸问题中变得很流行。我们重新审视了AGMs,并发现AGMs所使用的自适应学习率(A-LR)在各个时期的问题维度上(即各向异性尺度)变化显著,这可能会导致收敛和泛化方面的问题。所有现有的改进AGMs实际上都是在努力修正A-LR。从理论上讲,我们提供了一种新的方法来分析AGMs的收敛性,并证明Adam的收敛速度也取决于其超参数є,这一点在之前被忽视了。基于这两个事实,我们通过用激活()函数校准A-LR提出了一种新的AGM,从而得到了Sadam和SAMSGrad方法。我们进一步证明,与Adam相比,这些算法在非凸、非强凸和Polyak-Łojasiewicz条件下具有更好的收敛速度。实证研究支持了我们对各向异性A-LR的观察,并表明所提出的方法在多个深度学习任务中优于现有的AGMs,甚至比S-Momentum具有更好的泛化能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/0445e9cb2dc9/nihms-1778481-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/04078f83b64a/nihms-1778481-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/f5bc2d6a367e/nihms-1778481-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/2a736c0dc4f9/nihms-1778481-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/ba9aed32d9ab/nihms-1778481-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/44ddce1b170e/nihms-1778481-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/4b570d10b093/nihms-1778481-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/7270a66f32d7/nihms-1778481-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/1c88e2b485e6/nihms-1778481-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/5e85096a14e9/nihms-1778481-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/da26f5e94cd1/nihms-1778481-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/69ab4055ac34/nihms-1778481-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/363bb2b82438/nihms-1778481-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/bf2c42294d3f/nihms-1778481-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/3f2c00c59854/nihms-1778481-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/e5e52c81fb7c/nihms-1778481-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/3217ef662487/nihms-1778481-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/0445e9cb2dc9/nihms-1778481-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/04078f83b64a/nihms-1778481-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/f5bc2d6a367e/nihms-1778481-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/2a736c0dc4f9/nihms-1778481-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/ba9aed32d9ab/nihms-1778481-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/44ddce1b170e/nihms-1778481-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/4b570d10b093/nihms-1778481-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/7270a66f32d7/nihms-1778481-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/1c88e2b485e6/nihms-1778481-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/5e85096a14e9/nihms-1778481-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/da26f5e94cd1/nihms-1778481-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/69ab4055ac34/nihms-1778481-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/363bb2b82438/nihms-1778481-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/bf2c42294d3f/nihms-1778481-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/3f2c00c59854/nihms-1778481-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/e5e52c81fb7c/nihms-1778481-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/3217ef662487/nihms-1778481-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/573a/8951388/0445e9cb2dc9/nihms-1778481-f0006.jpg

相似文献

1
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM.校准自适应学习率以提高ADAM算法的收敛性。
Neurocomputing (Amst). 2022 Apr 7;481:333-356. doi: 10.1016/j.neucom.2022.01.014. Epub 2022 Jan 21.
2
Stochastic momentum methods for non-convex learning without bounded assumptions.无界假设下非凸学习的随机动量方法。
Neural Netw. 2023 Aug;165:830-845. doi: 10.1016/j.neunet.2023.06.021. Epub 2023 Jun 23.
3
A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration.结合加权聚合与动量加速的AdaGrad统一分析
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14482-14490. doi: 10.1109/TNNLS.2023.3279381. Epub 2024 Oct 7.
4
Convergence analysis of AdaBound with relaxed bound functions for non-convex optimization.AdaBound 与松弛边界函数在非凸优化中的收敛性分析。
Neural Netw. 2022 Jan;145:300-307. doi: 10.1016/j.neunet.2021.10.026. Epub 2021 Nov 8.
5
Painless Stochastic Conjugate Gradient for Large-Scale Machine Learning.用于大规模机器学习的无痛随机共轭梯度法
IEEE Trans Neural Netw Learn Syst. 2024 Oct;35(10):14645-14658. doi: 10.1109/TNNLS.2023.3280826. Epub 2024 Oct 7.
6
UAdam: Unified Adam-Type Algorithmic Framework for Nonconvex Optimization.UAdam:用于非凸优化的统一Adam型算法框架。
Neural Comput. 2024 Aug 19;36(9):1912-1938. doi: 10.1162/neco_a_01692.
7
AdaSAM: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks.AdaSAM:通过自适应学习率和动量增强锐度感知最小化以训练深度神经网络
Neural Netw. 2024 Jan;169:506-519. doi: 10.1016/j.neunet.2023.10.044. Epub 2023 Nov 1.
8
Towards Understanding Convergence and Generalization of AdamW.迈向理解AdamW的收敛性与泛化能力
IEEE Trans Pattern Anal Mach Intell. 2024 Sep;46(9):6486-6493. doi: 10.1109/TPAMI.2024.3382294. Epub 2024 Aug 6.
9
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models.阿丹:用于更快优化深度模型的自适应涅斯捷罗夫动量算法。
IEEE Trans Pattern Anal Mach Intell. 2024 Dec;46(12):9508-9520. doi: 10.1109/TPAMI.2024.3423382. Epub 2024 Nov 6.
10
Learning Rates for Nonconvex Pairwise Learning.非凸对学习的学习率。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9996-10011. doi: 10.1109/TPAMI.2023.3259324. Epub 2023 Jun 30.

引用本文的文献

1
Adaptive network steganography using deep learning and multimedia video analysis for enhanced security and fidelity.使用深度学习和多媒体视频分析的自适应网络隐写术,以增强安全性和保真度。
PLoS One. 2025 Jun 5;20(6):e0318795. doi: 10.1371/journal.pone.0318795. eCollection 2025.
2
TCAINet an RGB T salient object detection model with cross modal fusion and adaptive decoding.TCAINet:一种具有跨模态融合和自适应解码的RGB-T显著目标检测模型。
Sci Rep. 2025 Apr 24;15(1):14266. doi: 10.1038/s41598-025-98423-z.
3
Driver fatigue detection using PPG signal, facial features, head postures with an LSTM model.

本文引用的文献

1
Variance Reduction in Stochastic Gradient Langevin Dynamics.随机梯度朗之万动力学中的方差缩减
Adv Neural Inf Process Syst. 2016 Dec;29:1154-1162.
使用带有长短期记忆(LSTM)模型的光电容积脉搏波描记(PPG)信号、面部特征和头部姿势进行驾驶员疲劳检测。
Heliyon. 2024 Oct 24;10(21):e39479. doi: 10.1016/j.heliyon.2024.e39479. eCollection 2024 Nov 15.
4
Learnable digital signal processing: a new benchmark of linearity compensation for optical fiber communications.可学习的数字信号处理:光纤通信线性度补偿的新基准。
Light Sci Appl. 2024 Aug 13;13(1):188. doi: 10.1038/s41377-024-01556-5.
5
Identification of Multiple Diseases in Apple Leaf Based on Optimized Lightweight Convolutional Neural Network.基于优化轻量级卷积神经网络的苹果叶片多种病害识别
Plants (Basel). 2024 Jun 1;13(11):1535. doi: 10.3390/plants13111535.
6
Bridged-U-Net-ASPP-EVO and Deep Learning Optimization for Brain Tumor Segmentation.用于脑肿瘤分割的桥接U-Net-ASPP-EVO及深度学习优化
Diagnostics (Basel). 2023 Aug 9;13(16):2633. doi: 10.3390/diagnostics13162633.
7
Smart brain tumor diagnosis system utilizing deep convolutional neural networks.利用深度卷积神经网络的智能脑肿瘤诊断系统
Multimed Tools Appl. 2023 Apr 28:1-27. doi: 10.1007/s11042-023-15422-w.
8
Towards Automated Optimization of Residual Convolutional Neural Networks for Electrocardiogram Classification.迈向用于心电图分类的残差卷积神经网络的自动优化
Cognit Comput. 2023 Feb 15:1-11. doi: 10.1007/s12559-022-10103-6.
9
MSWNet: A visual deep machine learning method adopting transfer learning based upon ResNet 50 for municipal solid waste sorting.MSWNet:一种基于ResNet 50采用迁移学习的城市固体废物分类视觉深度机器学习方法。
Front Environ Sci Eng. 2023;17(6):77. doi: 10.1007/s11783-023-1677-1. Epub 2023 Jan 1.
10
Interactive framework for Covid-19 detection and segmentation with feedback facility for dynamically improved accuracy and trust.具有反馈功能的 Covid-19 检测和分割交互框架,可动态提高准确性和可信度。
PLoS One. 2022 Dec 22;17(12):e0278487. doi: 10.1371/journal.pone.0278487. eCollection 2022.