Suppr超能文献

通过过参数化神经网络中的非典型相变进行学习。

Learning through atypical phase transitions in overparameterized neural networks.

作者信息

Baldassi Carlo, Lauditi Clarissa, Malatesta Enrico M, Pacelli Rosalba, Perugini Gabriele, Zecchina Riccardo

机构信息

Artificial Intelligence Lab, Bocconi University, 20136 Milano, Italy.

Department of Applied Science and Technology, Politecnico di Torino, 10129 Torino, Italy.

出版信息

Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.

Abstract

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for nonconvex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in nonconvex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario.

摘要

当前的深度神经网络具有高度的过参数化(多达数十亿个连接权重)且是非线性的。然而,它们可以通过梯度下降算法的变体几乎完美地拟合数据,并在不过度拟合的情况下达到意想不到的预测精度水平。这些都是令人瞩目的结果,违背了统计学习的预测,并对非凸优化提出了概念性挑战。在本文中,我们使用无序系统统计物理学的方法,对在结构更简单但“隐藏”的网络生成的数据上训练的非凸二元神经网络模型中的过参数化计算结果进行分析研究。随着连接权重数量的增加,我们跟踪误差损失函数不同最小值的几何结构变化,并将它们与学习和泛化性能联系起来。第一次转变发生在所谓的插值点,此时开始存在解(完美拟合成为可能)。这种转变反映了典型解的特性,然而这些解处于尖锐的最小值且难以采样。经过一段间隔后,第二次转变发生,出现了一种不同类型的“非典型”结构的不连续现象:权重空间中特别密集且具有良好泛化特性的广泛区域。两种解共存,典型解的数量呈指数级更多,但根据经验我们发现高效算法采样的是非典型的、罕见的解。这表明非典型相变是与学习相关的相变。用实际网络对理论提出的可观测量进行数值测试的结果与这种情况一致。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验