Suppr超能文献

基于互信息的学习率衰减用于深度神经网络的随机梯度下降训练

Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks.

作者信息

Vasudevan Shrihari

机构信息

IBM Research, Bangalore 560045, India.

出版信息

Entropy (Basel). 2020 May 17;22(5):560. doi: 10.3390/e22050560.

Abstract

This paper demonstrates a novel approach to training deep neural networks using a Mutual Information (MI)-driven, decaying Learning Rate (LR), Stochastic Gradient Descent (SGD) algorithm. MI between the output of the neural network and true outcomes is used to adaptively set the LR for the network, in every epoch of the training cycle. This idea is extended to layer-wise setting of LR, as MI naturally provides a layer-wise performance metric. A LR range test determining the operating LR range is also proposed. Experiments compared this approach with popular alternatives such as gradient-based adaptive LR algorithms like Adam, RMSprop, and LARS. Competitive to better accuracy outcomes obtained in competitive to better time, demonstrate the feasibility of the metric and approach.

摘要

本文展示了一种使用互信息(MI)驱动、衰减学习率(LR)的随机梯度下降(SGD)算法来训练深度神经网络的新方法。在训练周期的每个 epoch 中,神经网络输出与真实结果之间的互信息用于自适应地设置网络的学习率。由于互信息自然地提供了逐层性能度量,该思想被扩展到学习率的逐层设置。还提出了一种确定工作学习率范围的学习率范围测试。实验将此方法与诸如 Adam、RMSprop 和 LARS 等基于梯度的自适应学习率算法等流行替代方法进行了比较。在更好的时间内获得了具有竞争力的更高准确率结果,证明了该度量和方法的可行性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb88/7517082/6b19f8c295cf/entropy-22-00560-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验