神经网络泛化误差的高维动力学。

High-dimensional dynamics of generalization error in neural networks.

机构信息

Center for Brain Science, Harvard University, Cambridge, MA 02138, United States of America.

出版信息

Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.

DOI:10.1016/j.neunet.2020.08.022

PMID:33022471

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7685244/

Abstract

We perform an analysis of the average generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that standard application of theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.

摘要

我们对使用梯度下降训练的大型神经网络的平均泛化动态进行了分析。我们研究了实际相关的“高维”情况，其中网络中的自由参数数量与数据集中的示例数量相同或更大。使用随机矩阵理论和线性模型的精确解，我们推导出学习的泛化误差和训练误差动态，并分析它们如何取决于数据的维度和学习问题的信噪比。我们发现，梯度下降学习的动态自然可以防止大型网络中的过度训练和过拟合。在中间网络大小下，过度训练最严重，此时有效自由参数数量等于样本数量，因此可以通过使网络更小或更大来减少过度训练。此外，在高维情况下，低泛化误差需要从较小的初始权重开始。然后，我们转向非线性神经网络，并表明使网络非常大不会损害其泛化性能。相反，即使没有任何早期停止或正则化，它实际上也可以减少过度训练。我们确定了过完备模型中这种行为的两个新现象：首先，在梯度下降下，权重存在一个冻结子空间，其中不会发生学习；其次，高维情况下的统计特性产生了条件更好的输入相关性，从而防止了过度训练。我们证明，诸如 Rademacher 复杂度之类的理论的标准应用在预测深度神经网络的泛化性能方面是不准确的，并且推导出了一个替代的界，该界包含了冻结子空间和条件作用，并定性地匹配了模拟中观察到的行为。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f6c3/7685244/663e3dbbdeb8/gr1.jpg

相似文献

High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。

Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.

Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.从数据分布和神经网络平滑度的角度量化深度学习中的泛化误差。

Neural Netw. 2020 Oct;130:85-99. doi: 10.1016/j.neunet.2020.06.024. Epub 2020 Jul 3.

Dynamics in Deep Classifiers Trained with the Square Loss: Normalization, Low Rank, Neural Collapse, and Generalization Bounds.使用平方损失训练的深度分类器中的动力学：归一化、低秩、神经崩溃和泛化界。

Research (Wash D C). 2023 Mar 8;6:0024. doi: 10.34133/research.0024. eCollection 2023.

Asymptotic statistical theory of overtraining and cross-validation.过度训练与交叉验证的渐近统计理论

IEEE Trans Neural Netw. 1997;8(5):985-96. doi: 10.1109/72.623200.

Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers.随机梯度下降在同质神经网络和线性分类器中的稳定性分析。

Neural Netw. 2023 Jul;164:382-394. doi: 10.1016/j.neunet.2023.04.028. Epub 2023 Apr 25.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

Theoretical issues in deep networks.深度网络中的理论问题。

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.

Shaping the learning landscape in neural networks around wide flat minima.围绕宽而平坦的极小值塑造神经网络的学习景观。

Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):161-170. doi: 10.1073/pnas.1908636117. Epub 2019 Dec 23.

Highly robust reconstruction framework for three-dimensional optical imaging based on physical model constrained neural networks.基于物理模型约束神经网络的三维光学成像强稳健重建框架。

Phys Med Biol. 2024 Mar 21;69(7). doi: 10.1088/1361-6560/ad2ca3.

Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。

Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.

引用本文的文献

A normative principle governing memory transfer in cerebellar motor learning.一条支配小脑运动学习中记忆转移的规范性原则。

Nat Commun. 2025 Jul 1;16(1):5479. doi: 10.1038/s41467-025-60511-z.

The Intrinsic Dimension of Neural Network Ensembles.神经网络集成的内在维度。

Entropy (Basel). 2025 Apr 18;27(4):440. doi: 10.3390/e27040440.

Information FOMO: The Unhealthy Fear of Missing Out on Information-A Method for Removing Misleading Data for Healthier Models.信息错失恐惧症：对错过信息的不健康恐惧——一种去除误导性数据以建立更健康模型的方法。

Entropy (Basel). 2024 Sep 30;26(10):835. doi: 10.3390/e26100835.

Explaining neural scaling laws.解释神经缩放定律。

Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2311878121. doi: 10.1073/pnas.2311878121. Epub 2024 Jun 24.

Transition to chaos separates learning regimes and relates to measure of consciousness in recurrent neural networks.向混沌的转变分隔了学习模式，并与循环神经网络中的意识度量相关。

bioRxiv. 2024 May 15:2024.05.15.594236. doi: 10.1101/2024.05.15.594236.

Why do probabilistic clinical models fail to transport between sites.为什么概率性临床模型无法在不同地点之间进行迁移？

NPJ Digit Med. 2024 Mar 1;7(1):53. doi: 10.1038/s41746-024-01037-4.

Modelling dataset bias in machine-learned theories of economic decision-making.经济决策机器学习理论中的数据集偏差建模。

Nat Hum Behav. 2024 Apr;8(4):679-691. doi: 10.1038/s41562-023-01784-6. Epub 2024 Jan 12.

An analytical theory of curriculum learning in teacher-student networks.师生网络中课程学习的分析理论。

J Stat Mech. 2022 Nov 1;2022(11):114014. doi: 10.1088/1742-5468/ac9b3c. Epub 2022 Nov 24.

Neuromorphic Computing via Fission-based Broadband Frequency Generation.基于裂变的宽带频率生成的神经形态计算。

Adv Sci (Weinh). 2023 Dec;10(35):e2303835. doi: 10.1002/advs.202303835. Epub 2023 Oct 2.

Organizing memories for generalization in complementary learning systems.组织记忆以促进互补学习系统的泛化。

Nat Neurosci. 2023 Aug;26(8):1438-1448. doi: 10.1038/s41593-023-01382-9. Epub 2023 Jul 20.

本文引用的文献

SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION.高维无脊最小二乘插值中的意外情况。

Ann Stat. 2022 Apr;50(2):949-986. doi: 10.1214/21-aos2133. Epub 2022 Apr 7.

Benign overfitting in linear regression.线性回归中的良性过拟合。

Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30063-30070. doi: 10.1073/pnas.1907378117. Epub 2020 Apr 24.

Reconciling modern machine-learning practice and the classical bias-variance trade-off.调和现代机器学习实践与经典偏差-方差权衡。

Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.

Temporal Evolution of Generalization during Learning in Linear Networks.线性网络学习过程中泛化的时间演变

Neural Comput. 1991 Winter;3(4):589-603. doi: 10.1162/neco.1991.3.4.589.

A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。

Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Deep learning in neural networks: an overview.神经网络中的深度学习：综述。

Neural Netw. 2015 Jan;61:85-117. doi: 10.1016/j.neunet.2014.09.003. Epub 2014 Oct 13.

Dynamics of learning near singularities in radial basis function networks.径向基函数网络中奇点附近的学习动态

Neural Netw. 2008 Sep;21(7):989-1005. doi: 10.1016/j.neunet.2008.06.017. Epub 2008 Jul 1.

Learning in linear neural networks: a survey.线性神经网络中的学习：一项综述。

IEEE Trans Neural Netw. 1995;6(4):837-58. doi: 10.1109/72.392248.

Asymptotic statistical theory of overtraining and cross-validation.过度训练与交叉验证的渐近统计理论

IEEE Trans Neural Netw. 1997;8(5):985-96. doi: 10.1109/72.623200.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

神经网络泛化误差的高维动力学。

High-dimensional dynamics of generalization error in neural networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献