• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过过参数化神经网络中的非典型相变进行学习。

Learning through atypical phase transitions in overparameterized neural networks.

作者信息

Baldassi Carlo, Lauditi Clarissa, Malatesta Enrico M, Pacelli Rosalba, Perugini Gabriele, Zecchina Riccardo

机构信息

Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy.

Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy.

出版信息

Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.

DOI:10.1103/PhysRevE.106.014116
PMID:35974501
Abstract

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for nonconvex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario.

摘要

当前的深度神经网络具有高度的过参数化(多达数十亿个连接权重)且是非线性的。然而,它们可以通过梯度下降算法的变体几乎完美地拟合数据,并在不过度拟合的情况下达到意想不到的预测精度水平。这些都是令人瞩目的结果,违背了统计学习的预测,并对非凸优化提出了概念性挑战。在本文中,我们使用无序系统统计物理学的方法,对在结构更简单但“隐藏”的网络生成的数据上训练的非凸二元神经网络模型中的过参数化计算结果进行分析研究。随着连接权重数量的增加,我们跟踪误差损失函数不同最小值的几何结构变化,并将它们与学习和泛化性能联系起来。第一次转变发生在所谓的插值点,此时开始存在解(完美拟合成为可能)。这种转变反映了典型解的特性,然而这些解处于尖锐的最小值且难以采样。经过一段间隔后,第二次转变发生,出现了一种不同类型的“非典型”结构的不连续现象:权重空间中特别密集且具有良好泛化特性的广泛区域。两种解共存,典型解的数量呈指数级更多,但根据经验我们发现高效算法采样的是非典型的、罕见的解。这表明非典型相变是与学习相关的相变。用实际网络对理论提出的可观测量进行数值测试的结果与这种情况一致。

相似文献

1
Learning through atypical phase transitions in overparameterized neural networks.通过过参数化神经网络中的非典型相变进行学习。
Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.
2
Unveiling the Structure of Wide Flat Minima in Neural Networks.揭示神经网络中的宽平坦极小值结构。
Phys Rev Lett. 2021 Dec 31;127(27):278301. doi: 10.1103/PhysRevLett.127.278301.
3
Shaping the learning landscape in neural networks around wide flat minima.围绕宽而平坦的极小值塑造神经网络的学习景观。
Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):161-170. doi: 10.1073/pnas.1908636117. Epub 2019 Dec 23.
4
Typical and atypical solutions in nonconvex neural networks with discrete and continuous weights.具有离散和连续权重的非凸神经网络中的典型和非典型解决方案。
Phys Rev E. 2023 Aug;108(2-1):024310. doi: 10.1103/PhysRevE.108.024310.
5
Properties of the Geometry of Solutions and Capacity of Multilayer Neural Networks with Rectified Linear Unit Activations.具有修正线性单元激活函数的多层神经网络解的几何性质和容量。
Phys Rev Lett. 2019 Oct 25;123(17):170602. doi: 10.1103/PhysRevLett.123.170602.
6
Theoretical issues in deep networks.深度网络中的理论问题。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.
7
Role of Synaptic Stochasticity in Training Low-Precision Neural Networks.突触随机性在训练低精度神经网络中的作用。
Phys Rev Lett. 2018 Jun 29;120(26):268103. doi: 10.1103/PhysRevLett.120.268103.
8
Geometry of Energy Landscapes and the Optimizability of Deep Neural Networks.能量景观的几何形状与深度神经网络的可优化性。
Phys Rev Lett. 2020 Mar 13;124(10):108301. doi: 10.1103/PhysRevLett.124.108301.
9
Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models.记忆而不过度拟合:超参数化模型中的偏差、方差和插值
Phys Rev Res. 2022 Mar-May;4(1). doi: 10.1103/physrevresearch.4.013201. Epub 2022 Mar 15.
10
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.

引用本文的文献

1
Eight challenges in developing theory of intelligence.发展智力理论的八大挑战。
Front Comput Neurosci. 2024 Jul 24;18:1388166. doi: 10.3389/fncom.2024.1388166. eCollection 2024.
2
Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems.在离散优化和推理问题中,类似随机梯度下降的松弛方法等同于 metropolis 动力学。
Sci Rep. 2024 May 21;14(1):11638. doi: 10.1038/s41598-024-62625-8.