• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

具有非凸目标的随机梯度下降的学习率

Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives.

作者信息

Lei Yunwen, Tang Ke

出版信息

IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4505-4511. doi: 10.1109/TPAMI.2021.3068154. Epub 2021 Nov 3.

DOI:10.1109/TPAMI.2021.3068154
PMID:33755555
Abstract

Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

摘要

随机梯度下降(SGD)已成为训练高度复杂和非凸模型的首选方法,因为它不仅可以找到使训练误差最小化的良好解决方案,而且泛化能力也很强。在文献中,人们分别研究了计算属性和统计属性,以了解SGD的行为。然而,在非凸学习环境中,缺乏对计算属性和统计属性进行联合考虑的研究。在本文中,我们通过给出计算误差和统计误差的高概率界,为非凸学习开发了新颖的SGD学习率。我们表明,SGD迭代的复杂度相对于迭代次数以可控的方式增长,这为如何通过调整迭代次数来平衡计算误差和统计误差以实现隐式正则化提供了见解。作为一个副产品,我们还通过展示梯度的一致收敛与拉德马赫混沌复杂度的联系,对现有的关于梯度一致收敛的研究进行了轻微改进。

相似文献

1
Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives.具有非凸目标的随机梯度下降的学习率
IEEE Trans Pattern Anal Mach Intell. 2021 Dec;43(12):4505-4511. doi: 10.1109/TPAMI.2021.3068154. Epub 2021 Nov 3.
2
Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions.无梯度有界假设下非凸学习的随机梯度下降法
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4394-4400. doi: 10.1109/TNNLS.2019.2952219. Epub 2019 Dec 11.
3
OPTIMAL COMPUTATIONAL AND STATISTICAL RATES OF CONVERGENCE FOR SPARSE NONCONVEX LEARNING PROBLEMS.稀疏非凸学习问题的最优计算与统计收敛速率
Ann Stat. 2014;42(6):2164-2201. doi: 10.1214/14-AOS1238.
4
Learning Rates for Nonconvex Pairwise Learning.非凸对学习的学习率。
IEEE Trans Pattern Anal Mach Intell. 2023 Aug;45(8):9996-10011. doi: 10.1109/TPAMI.2023.3259324. Epub 2023 Jun 30.
5
A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.
6
Preconditioned Stochastic Gradient Descent.预处理随机梯度下降。
IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1454-1466. doi: 10.1109/TNNLS.2017.2672978. Epub 2017 Mar 9.
7
Faster Stochastic Quasi-Newton Methods.更快的随机拟牛顿法
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4388-4397. doi: 10.1109/TNNLS.2021.3056947. Epub 2022 Aug 31.
8
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.随机梯度下降中的逆方差-平坦度关系对于找到平坦最小值至关重要。
Proc Natl Acad Sci U S A. 2021 Mar 2;118(9). doi: 10.1073/pnas.2015617118.
9
Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval.随机初始化梯度下降法:非凸相位恢复的快速全局收敛性
Math Program. 2019 Jul;176(1-2):5-37. doi: 10.1007/s10107-019-01363-6. Epub 2019 Feb 4.
10
Refined rademacher chaos complexity bounds with applications to the multikernel learning problem.细化的 Rademacher 混沌复杂度界及其在多核学习问题中的应用。
Neural Comput. 2014 Apr;26(4):739-60. doi: 10.1162/NECO_a_00566. Epub 2014 Jan 30.

引用本文的文献

1
Discretizing multiple continuous predictors with U-shaped relationships with lnOR: introducing the recursive gradient scanning method in clinical and epidemiological research.对与自然对数比值比(lnOR)呈U型关系的多个连续预测变量进行离散化:在临床和流行病学研究中引入递归梯度扫描法
BMC Med Res Methodol. 2025 Mar 12;25(1):70. doi: 10.1186/s12874-025-02522-4.
2
SSATNet: Spectral-spatial attention transformer for hyperspectral corn image classification.SSATNet:用于高光谱玉米图像分类的光谱-空间注意力变换器
Front Plant Sci. 2025 Jan 16;15:1458978. doi: 10.3389/fpls.2024.1458978. eCollection 2024.
3
Efficient residual network using hyperspectral images for corn variety identification.
使用高光谱图像进行玉米品种识别的高效残差网络
Front Plant Sci. 2024 Apr 16;15:1376915. doi: 10.3389/fpls.2024.1376915. eCollection 2024.