• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用平方损失训练的深度分类器中的动力学:归一化、低秩、神经崩溃和泛化界。

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

作者信息

Xu Mengjia, Rangamani Akshay, Liao Qianli, Galanti Tomer, Poggio Tomaso

机构信息

Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.

Division of Applied Mathematics, Brown University, Providence, RI, USA.

出版信息

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

DOI:10.34133/research.0024
PMID:37223467
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10202460/
Abstract

We overview several properties-old and new-of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum , which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is . In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption-unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by "sparse" deep networks without incurring in the curse of dimensionality.

摘要

我们概述了在平方损失下训练过参数化深度网络的几个新旧特性。我们首先考虑深度齐次整流线性单元网络中平方损失下梯度流的动力学模型。我们研究了在不同形式的梯度下降下,当使用拉格朗日乘子归一化并结合权重衰减时,收敛到具有绝对最小值的解的情况,该绝对最小值是各层权重矩阵的Frobenius范数的乘积。对于特定网络架构,界定其预期误差的极小值的一个主要特性是 。特别是,我们为卷积层推导了基于范数的新界限,这些界限比密集网络的经典界限好几个数量级。接下来,我们证明在存在权重衰减的情况下,通过随机梯度下降获得的拟插值解对低秩权重矩阵有偏差,这应该会提高泛化能力。相同的分析预测了深度网络存在固有的随机梯度下降噪声。在这两种情况下,我们都通过实验验证了我们的预测。然后,我们在没有任何特定假设的情况下预测神经崩溃及其特性——这与其他已发表的证明不同。我们的分析支持这样一种观点,即对于适合稀疏深度架构(如卷积神经网络)的问题,深度网络相对于其他分类器的优势更大。原因是组合稀疏目标函数可以被“稀疏”深度网络很好地近似,而不会陷入维度灾难。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/98a1cd71774a/research.0024.fig.0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/3bd21b0afd6b/research.0024.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/951989815515/research.0024.fig.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/194a56afdacf/research.0024.fig.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/7ef43e30bbab/research.0024.fig.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/cbcab7e397e5/research.0024.fig.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/dc3a6451bfde/research.0024.fig.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/73cde116379d/research.0024.fig.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/50ec47bf23df/research.0024.fig.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/d21911757745/research.0024.fig.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/98a1cd71774a/research.0024.fig.0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/3bd21b0afd6b/research.0024.fig.001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/951989815515/research.0024.fig.002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/194a56afdacf/research.0024.fig.003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/7ef43e30bbab/research.0024.fig.004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/cbcab7e397e5/research.0024.fig.005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/dc3a6451bfde/research.0024.fig.006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/73cde116379d/research.0024.fig.007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/50ec47bf23df/research.0024.fig.008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/d21911757745/research.0024.fig.009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/98a1cd71774a/research.0024.fig.0010.jpg

相似文献

1
Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.使用平方损失训练的深度分类器中的动力学:归一化、低秩、神经崩溃和泛化界。
Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.
2
Theoretical issues in deep networks.深度网络中的理论问题。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.
3
Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers.随机梯度下降在同质神经网络和线性分类器中的稳定性分析。
Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.
4
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.
5
Going Deeper, Generalizing Better: An Information-Theoretic View for Deep Learning.深入挖掘,更好地泛化:深度学习的信息论视角
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16683-16695. doi: 10.1109/TNNLS.2023.3297113. Epub 2024 Oct 29.
6
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.
7
Stochastic Mirror Descent on Overparameterized Nonlinear Models.过参数化非线性模型上的随机镜像下降法
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7717-7727. doi: 10.1109/TNNLS.2021.3087480. Epub 2022 Nov 30.
8
Convergence of deep convolutional neural networks.深度卷积神经网络的融合。
Neural Netw. 2022 Sep;153:553-563. doi: 10.1016/j.neunet.2022.06.031. Epub 2022 Jun 30.
9
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup.师生模式下两层神经网络的随机梯度下降动力学
J Stat Mech. 2020 Dec;2020(12):124010. doi: 10.1088/1742-5468/abc61e. Epub 2020 Dec 21.
10
Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.从数据分布和神经网络平滑度的角度量化深度学习中的泛化误差。
Neural Netw. 2020 Oct;130:85-99. doi: 10.1016/j.neunet.2020.06.024. Epub 2020 Jul 3.

引用本文的文献

1
Nature-Inspired Intelligent Computing: A Comprehensive Survey.受自然启发的智能计算:全面综述
Research (Wash D C). 2024 Aug 16;7:0442. doi: 10.34133/research.0442. eCollection 2024.
2
Efficient Simultaneous Detection of Metabolites Based on Electroenzymatic Assembly Strategy.基于酶促组装策略的代谢物高效同步检测
BME Front. 2023 Sep 19;4:0027. doi: 10.34133/bmef.0027. eCollection 2023.

本文引用的文献

1
Prevalence of neural collapse during the terminal phase of deep learning training.深度学习训练末期的神经崩溃的普遍性。
Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):24652-24663. doi: 10.1073/pnas.2015509117. Epub 2020 Sep 21.
2
Theoretical issues in deep networks.深度网络中的理论问题。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.
3
A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.