使用平方损失训练的深度分类器中的动力学：归一化、低秩、神经崩溃和泛化界。

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

作者信息

Xu Mengjia, Rangamani Akshay, Liao Qianli, Galanti Tomer, Poggio Tomaso

机构信息

Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.

Division of Applied Mathematics, Brown University, Providence, RI, USA.

出版信息

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

DOI:10.34133/research.0024

PMID:37223467

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10202460/

Abstract

We overview several properties-old and new-of training overparameterized deep networks under the square loss. We first consider a model of the dynamics of gradient flow under the square loss in deep homogeneous rectified linear unit networks. We study the convergence to a solution with the absolute minimum , which is the product of the Frobenius norms of each layer weight matrix, when normalization by Lagrange multipliers is used together with weight decay under different forms of gradient descent. A main property of the minimizers that bound their expected error for a specific network architecture is . In particular, we derive novel norm-based bounds for convolutional layers that are orders of magnitude better than classical bounds for dense networks. Next, we prove that quasi-interpolating solutions obtained by stochastic gradient descent in the presence of weight decay have a bias toward low-rank weight matrices, which should improve generalization. The same analysis predicts the existence of an inherent stochastic gradient descent noise for deep networks. In both cases, we verify our predictions experimentally. We then predict neural collapse and its properties without any specific assumption-unlike other published proofs. Our analysis supports the idea that the advantage of deep networks relative to other classifiers is greater for problems that are appropriate for sparse deep architectures such as convolutional neural networks. The reason is that compositionally sparse target functions can be approximated well by "sparse" deep networks without incurring in the curse of dimensionality.

摘要

我们概述了在平方损失下训练过参数化深度网络的几个新旧特性。我们首先考虑深度齐次整流线性单元网络中平方损失下梯度流的动力学模型。我们研究了在不同形式的梯度下降下，当使用拉格朗日乘子归一化并结合权重衰减时，收敛到具有绝对最小值的解的情况，该绝对最小值是各层权重矩阵的Frobenius范数的乘积。对于特定网络架构，界定其预期误差的极小值的一个主要特性是。特别是，我们为卷积层推导了基于范数的新界限，这些界限比密集网络的经典界限好几个数量级。接下来，我们证明在存在权重衰减的情况下，通过随机梯度下降获得的拟插值解对低秩权重矩阵有偏差，这应该会提高泛化能力。相同的分析预测了深度网络存在固有的随机梯度下降噪声。在这两种情况下，我们都通过实验验证了我们的预测。然后，我们在没有任何特定假设的情况下预测神经崩溃及其特性——这与其他已发表的证明不同。我们的分析支持这样一种观点，即对于适合稀疏深度架构（如卷积神经网络）的问题，深度网络相对于其他分类器的优势更大。原因是组合稀疏目标函数可以被“稀疏”深度网络很好地近似，而不会陷入维度灾难。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10ee/10202460/3bd21b0afd6b/research.0024.fig.001.jpg

相似文献

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

Theoretical issues in deep networks.

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.

Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers.

Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.

High-dimensional dynamics of generalization error in neural networks.

Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.

Going Deeper, Generalizing Better: An Information-Theoretic View for Deep Learning.

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16683-16695. doi: 10.1109/TNNLS.2023.3297113. Epub 2024 Oct 29.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

Stochastic Mirror Descent on Overparameterized Nonlinear Models.

IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7717-7727. doi: 10.1109/TNNLS.2021.3087480. Epub 2022 Nov 30.

Convergence of deep convolutional neural networks.

Neural Netw. 2022 Sep;153:553-563. doi: 10.1016/j.neunet.2022.06.031. Epub 2022 Jun 30.

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup.

J Stat Mech. 2020 Dec;2020(12):124010. doi: 10.1088/1742-5468/abc61e. Epub 2020 Dec 21.

Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.

Neural Netw. 2020 Oct;130:85-99. doi: 10.1016/j.neunet.2020.06.024. Epub 2020 Jul 3.

引用本文的文献

Nature-Inspired Intelligent Computing: A Comprehensive Survey.

Research (Wash D C). 2024 Aug 16;7:0442. doi: 10.34133/research.0442. eCollection 2024.

Efficient Simultaneous Detection of Metabolites Based on Electroenzymatic Assembly Strategy.

BME Front. 2023 Sep 19;4:0027. doi: 10.34133/bmef.0027. eCollection 2023.

本文引用的文献

Prevalence of neural collapse during the terminal phase of deep learning training.

Proc Natl Acad Sci U S A. 2020 Oct 6;117(40):24652-24663. doi: 10.1073/pnas.2015509117. Epub 2020 Sep 21.

Theoretical issues in deep networks.

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.

A mean field view of the landscape of two-layer neural networks.

Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用平方损失训练的深度分类器中的动力学：归一化、低秩、神经崩溃和泛化界。

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

作者信息

Xu Mengjia, Rangamani Akshay, Liao Qianli, Galanti Tomer, Poggio Tomaso

机构信息

Center for Brains, Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA, USA.

Division of Applied Mathematics, Brown University, Providence, RI, USA.

出版信息

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

DOI:10.34133/research.0024

PMID:37223467

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10202460/

Abstract

摘要

使用平方损失训练的深度分类器中的动力学：归一化、低秩、神经崩溃和泛化界。

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

使用平方损失训练的深度分类器中的动力学：归一化、低秩、神经崩溃和泛化界。

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献