Suppr超能文献

基于层次优化方法的高效学习率自适应。

Efficient learning rate adaptation based on hierarchical optimization approach.

机构信息

Korea Research Institute of Chemical Technology (KRICT), Republic of Korea.

出版信息

Neural Netw. 2022 Jun;150:326-335. doi: 10.1016/j.neunet.2022.02.014. Epub 2022 Feb 25.

Abstract

This paper proposes a new hierarchical approach to learning rate adaptation in gradient methods, called learning rate optimization (LRO). LRO formulates the learning rate adaption problem as a hierarchical optimization problem that minimizes the loss function with respect to the learning rate for current model parameters and gradients. Then, LRO optimizes the learning rate based on the alternating direction method of multipliers (ADMM). In the process of this learning rate optimization, LRO does not require any second-order information and probabilistic model, so it is highly efficient. Furthermore, LRO does not require any additional hyperparameters when compared to the vanilla gradient method with the simple exponential learning rate decay. In the experiments, we integrated LRO with vanilla SGD and Adam. Then, we compared their optimization performance with the state-of-the-art learning rate adaptation methods and also the most commonly-used adaptive gradient methods. The SGD and Adam with LRO outperformed all the competitors on the benchmark datasets in image classification tasks.

摘要

本文提出了一种新的梯度方法学习率自适应方法,称为学习率优化(LRO)。LRO 将学习率自适应问题表述为一个分层优化问题,该问题最小化了当前模型参数和梯度的损失函数对学习率的依赖关系。然后,LRO 基于交替方向乘子法(ADMM)优化学习率。在这个学习率优化过程中,LRO 不需要任何二阶信息和概率模型,因此效率很高。此外,与使用简单指数学习率衰减的香草 SGD 相比,LRO 不需要任何额外的超参数。在实验中,我们将 LRO 与香草 SGD 和 Adam 集成在一起,然后将它们的优化性能与最先进的学习率自适应方法以及最常用的自适应梯度方法进行了比较。在图像分类任务的基准数据集上,LRO 与 SGD 和 Adam 的组合优于所有竞争对手。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验