Suppr超能文献

学习神经网络的不合理有效性:从可达状态、稳健集成到基本算法方案

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes.

作者信息

Baldassi Carlo, Borgs Christian, Chayes Jennifer T, Ingrosso Alessandro, Lucibello Carlo, Saglietti Luca, Zecchina Riccardo

机构信息

Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy;

Human Genetics Foundation-Torino, I-10126 Torino, Italy.

出版信息

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):E7655-E7662. doi: 10.1073/pnas.1608103113. Epub 2016 Nov 15.

Abstract

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

摘要

在人工神经网络中,从数据中学习是一项计算量很大的任务,其中大量连接权重通过基于随机梯度的启发式过程在代价函数上进行迭代调整。目前人们对这些系统中的学习过程如何发生,特别是它们如何避免陷入计算性能较差的配置,还了解得不够透彻。在这里,我们研究具有离散权重的网络这种困难情况,即使对于简单架构,其优化态势也非常崎岖,并且我们提供了理论和数值证据,证明在网络权重空间中存在罕见但极其密集且易于访问的配置区域。我们定义了一种度量,即稳健系综(RE),它抑制孤立配置的陷阱并放大这些密集区域的作用。我们在一些可精确求解的模型中解析计算了RE,还提供了一种易于实现的通用算法方案:定义一个由原始代价函数的有限个副本之和给出的代价函数,并带有一个将副本围绕驱动赋值进行中心化的约束。为了说明这一点,我们推导了几种强大的算法,从马尔可夫链到消息传递再到梯度下降过程,这些算法针对稳健的密集状态,从而在性能上有显著提升。对权重精度位数的弱依赖性使我们推测,非常相似的推理适用于更传统的神经网络。类似的算法方案也可以应用于其他优化问题。

相似文献

3
Shaping the learning landscape in neural networks around wide flat minima.围绕宽而平坦的极小值塑造神经网络的学习景观。
Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):161-170. doi: 10.1073/pnas.1908636117. Epub 2019 Dec 23.
7
Learning may need only a few bits of synaptic precision.学习可能只需要少量的突触精度。
Phys Rev E. 2016 May;93(5):052313. doi: 10.1103/PhysRevE.93.052313. Epub 2016 May 27.

引用本文的文献

2
Eight challenges in developing theory of intelligence.发展智力理论的八大挑战。
Front Comput Neurosci. 2024 Jul 24;18:1388166. doi: 10.3389/fncom.2024.1388166. eCollection 2024.
3
Machine learning meets physics: A two-way street.机器学习与物理学相遇:一条双向道。
Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2403580121. doi: 10.1073/pnas.2403580121. Epub 2024 Jun 24.
4
On the different regimes of stochastic gradient descent.论随机梯度下降的不同机制。
Proc Natl Acad Sci U S A. 2024 Feb 27;121(9):e2316301121. doi: 10.1073/pnas.2316301121. Epub 2024 Feb 20.
6
PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging.医学成像中深度(随机)网络的PAC贝叶斯性能保证
Med Image Comput Comput Assist Interv. 2021 Sep-Oct;12903:560-570. doi: 10.1007/978-3-030-87199-4_53. Epub 2021 Sep 21.
9
Archetypal landscapes for deep neural networks.典型的神经网络景观。
Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25.

本文引用的文献

2
Learning may need only a few bits of synaptic precision.学习可能只需要少量的突触精度。
Phys Rev E. 2016 May;93(5):052313. doi: 10.1103/PhysRevE.93.052313. Epub 2016 May 27.
4
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
5
Origin of the computational hardness for learning with binary synapses.二元突触学习的计算硬度起源。
Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Nov;90(5-1):052813. doi: 10.1103/PhysRevE.90.052813. Epub 2014 Nov 17.
9
Locked constraint satisfaction problems.锁定约束满足问题。
Phys Rev Lett. 2008 Aug 15;101(7):078702. doi: 10.1103/PhysRevLett.101.078702.
10
Entropy landscape and non-Gibbs solutions in constraint satisfaction problems.约束满足问题中的熵景观与非吉布斯解
Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Mar;77(3 Pt 1):031118. doi: 10.1103/PhysRevE.77.031118. Epub 2008 Mar 17.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验