Suppr超能文献

具有非凸目标的随机梯度下降的学习率

Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives.

作者信息

Lei Yunwen, Tang Ke

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4505-4511. doi: 10.1109/TPAMI.2021.3068154. Epub 2021 Nov 3.

Abstract

Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

摘要

随机梯度下降(SGD)已成为训练高度复杂和非凸模型的首选方法,因为它不仅可以找到使训练误差最小化的良好解决方案,而且泛化能力也很强。在文献中,人们分别研究了计算属性和统计属性,以了解SGD的行为。然而,在非凸学习环境中,缺乏对计算属性和统计属性进行联合考虑的研究。在本文中,我们通过给出计算误差和统计误差的高概率界,为非凸学习开发了新颖的SGD学习率。我们表明,SGD迭代的复杂度相对于迭代次数以可控的方式增长,这为如何通过调整迭代次数来平衡计算误差和统计误差以实现隐式正则化提供了见解。作为一个副产品,我们还通过展示梯度的一致收敛与拉德马赫混沌复杂度的联系,对现有的关于梯度一致收敛的研究进行了轻微改进。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验