• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机梯度下降在同质神经网络和线性分类器中的稳定性分析。

Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers.

机构信息

Department of Computer Science and software engineering, Laval University, Pavillon Adrien-Pouliot 1065, av. de la Médecine, Quebec, G1V0A6, Quebec, Canada.

出版信息

Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.

DOI:10.1016/j.neunet.2023.04.028
PMID:37167751
Abstract

We prove new generalization bounds for stochastic gradient descent when training classifiers with invariances. Our analysis is based on the stability framework and covers both the convex case of linear classifiers and the non-convex case of homogeneous neural networks. We analyze stability with respect to the normalized version of the loss function used for training. This leads to investigating a form of angle-wise stability instead of euclidean stability in weights. For neural networks, the measure of distance we consider is invariant to rescaling the weights of each layer. Furthermore, we exploit the notion of on-average stability in order to obtain a data-dependent quantity in the bound. This data-dependent quantity is seen to be more favorable when training with larger learning rates in our numerical experiments. This might help to shed some light on why larger learning rates can lead to better generalization in some practical scenarios.

摘要

我们证明了在具有不变性的分类器训练中,随机梯度下降的新的推广界限。我们的分析基于稳定性框架,涵盖了线性分类器的凸情况和同调神经网络的非凸情况。我们根据用于训练的损失函数的归一化版本分析稳定性。这导致在权重方面研究角度稳定性而不是欧几里得稳定性。对于神经网络,我们考虑的距离度量对于每层权重的缩放是不变的。此外,我们利用平均稳定性的概念来获得界中的数据相关量。在我们的数值实验中,当使用更大的学习率进行训练时,这个数据相关量更有利。这可能有助于解释为什么在某些实际情况下,更大的学习率可以导致更好的泛化。

相似文献

1
Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers.随机梯度下降在同质神经网络和线性分类器中的稳定性分析。
Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.
2
Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.使用平方损失训练的深度分类器中的动力学:归一化、低秩、神经崩溃和泛化界。
Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.
3
Improving generalization of deep neural networks by leveraging margin distribution.利用边缘分布提高深度神经网络的泛化能力。
Neural Netw. 2022 Jul;151:48-60. doi: 10.1016/j.neunet.2022.03.019. Epub 2022 Mar 17.
4
Learning matrix factorization with scalable distance metric and regularizer.使用可扩展距离度量和正则化器学习矩阵分解。
Neural Netw. 2023 Apr;161:254-266. doi: 10.1016/j.neunet.2023.01.034. Epub 2023 Feb 3.
5
Invariance, Encodings, and Generalization: Learning Identity Effects With Neural Networks.不变性、编码与泛化:用神经网络学习身份效应。
Neural Comput. 2022 Jul 14;34(8):1756-1789. doi: 10.1162/neco_a_01510.
6
To understand double descent, we need to understand VC theory.要理解双重下降,我们需要理解 VC 理论。
Neural Netw. 2024 Jan;169:242-256. doi: 10.1016/j.neunet.2023.10.014. Epub 2023 Oct 16.
7
Decentralized stochastic sharpness-aware minimization algorithm.去中心化随机锐化感知最小化算法。
Neural Netw. 2024 Aug;176:106325. doi: 10.1016/j.neunet.2024.106325. Epub 2024 Apr 17.
8
Learning smooth dendrite morphological neurons by stochastic gradient descent for pattern classification.通过随机梯度下降学习用于模式分类的平滑树突形态神经元。
Neural Netw. 2023 Nov;168:665-676. doi: 10.1016/j.neunet.2023.09.033. Epub 2023 Sep 25.
9
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.
10
Learning Fixed Points of Recurrent Neural Networks by Reparameterizing the Network Model.通过重新参数化网络模型学习递归神经网络的固定点。
Neural Comput. 2024 Jul 19;36(8):1568-1600. doi: 10.1162/neco_a_01681.