Korea Research Institute of Chemical Technology (KRICT), Republic of Korea.
Neural Netw. 2022 Jun;150:326-335. doi: 10.1016/j.neunet.2022.02.014. Epub 2022 Feb 25.
This paper proposes a new hierarchical approach to learning rate adaptation in gradient methods, called learning rate optimization (LRO). LRO formulates the learning rate adaption problem as a hierarchical optimization problem that minimizes the loss function with respect to the learning rate for current model parameters and gradients. Then, LRO optimizes the learning rate based on the alternating direction method of multipliers (ADMM). In the process of this learning rate optimization, LRO does not require any second-order information and probabilistic model, so it is highly efficient. Furthermore, LRO does not require any additional hyperparameters when compared to the vanilla gradient method with the simple exponential learning rate decay. In the experiments, we integrated LRO with vanilla SGD and Adam. Then, we compared their optimization performance with the state-of-the-art learning rate adaptation methods and also the most commonly-used adaptive gradient methods. The SGD and Adam with LRO outperformed all the competitors on the benchmark datasets in image classification tasks.
本文提出了一种新的梯度方法学习率自适应方法,称为学习率优化(LRO)。LRO 将学习率自适应问题表述为一个分层优化问题,该问题最小化了当前模型参数和梯度的损失函数对学习率的依赖关系。然后,LRO 基于交替方向乘子法(ADMM)优化学习率。在这个学习率优化过程中,LRO 不需要任何二阶信息和概率模型,因此效率很高。此外,与使用简单指数学习率衰减的香草 SGD 相比,LRO 不需要任何额外的超参数。在实验中,我们将 LRO 与香草 SGD 和 Adam 集成在一起,然后将它们的优化性能与最先进的学习率自适应方法以及最常用的自适应梯度方法进行了比较。在图像分类任务的基准数据集上,LRO 与 SGD 和 Adam 的组合优于所有竞争对手。