学习神经网络的不合理有效性：从可达状态、稳健集成到基本算法方案

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes.

作者信息

Baldassi Carlo, Borgs Christian, Chayes Jennifer T, Ingrosso Alessandro, Lucibello Carlo, Saglietti Luca, Zecchina Riccardo

机构信息

Department of Applied Science and Technology, Politecnico di Torino, I-10129 Torino, Italy;

Human Genetics Foundation-Torino, I-10126 Torino, Italy.

出版信息

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):E7655-E7662. doi: 10.1073/pnas.1608103113. Epub 2016 Nov 15.

DOI:10.1073/pnas.1608103113

PMID:27856745

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5137727/

Abstract

In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here, we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare-but extremely dense and accessible-regions of configurations in the network weight space. We define a measure, the robust ensemble (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models and also provide a general algorithmic scheme that is straightforward to implement: define a cost function given by a sum of a finite number of replicas of the original cost function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems.

摘要

在人工神经网络中，从数据中学习是一项计算量很大的任务，其中大量连接权重通过基于随机梯度的启发式过程在代价函数上进行迭代调整。目前人们对这些系统中的学习过程如何发生，特别是它们如何避免陷入计算性能较差的配置，还了解得不够透彻。在这里，我们研究具有离散权重的网络这种困难情况，即使对于简单架构，其优化态势也非常崎岖，并且我们提供了理论和数值证据，证明在网络权重空间中存在罕见但极其密集且易于访问的配置区域。我们定义了一种度量，即稳健系综（RE），它抑制孤立配置的陷阱并放大这些密集区域的作用。我们在一些可精确求解的模型中解析计算了RE，还提供了一种易于实现的通用算法方案：定义一个由原始代价函数的有限个副本之和给出的代价函数，并带有一个将副本围绕驱动赋值进行中心化的约束。为了说明这一点，我们推导了几种强大的算法，从马尔可夫链到消息传递再到梯度下降过程，这些算法针对稳健的密集状态，从而在性能上有显著提升。对权重精度位数的弱依赖性使我们推测，非常相似的推理适用于更传统的神经网络。类似的算法方案也可以应用于其他优化问题。

相似文献

Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes.学习神经网络的不合理有效性：从可达状态、稳健集成到基本算法方案

Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):E7655-E7662. doi: 10.1073/pnas.1608103113. Epub 2016 Nov 15.

Subdominant Dense Clusters Allow for Simple Learning and High Computational Performance in Neural Networks with Discrete Synapses.亚优势密集簇允许离散突触神经网络进行简单学习和高计算性能。

Phys Rev Lett. 2015 Sep 18;115(12):128101. doi: 10.1103/PhysRevLett.115.128101.

Shaping the learning landscape in neural networks around wide flat minima.围绕宽而平坦的极小值塑造神经网络的学习景观。

Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):161-170. doi: 10.1073/pnas.1908636117. Epub 2019 Dec 23.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

Role of Synaptic Stochasticity in Training Low-Precision Neural Networks.突触随机性在训练低精度神经网络中的作用。

Phys Rev Lett. 2018 Jun 29;120(26):268103. doi: 10.1103/PhysRevLett.120.268103.

Learning through atypical phase transitions in overparameterized neural networks.通过过参数化神经网络中的非典型相变进行学习。

Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.

Learning may need only a few bits of synaptic precision.学习可能只需要少量的突触精度。

Phys Rev E. 2016 May;93(5):052313. doi: 10.1103/PhysRevE.93.052313. Epub 2016 May 27.

Statistical mechanics of continual learning: Variational principle and mean-field potential.持续学习的统计力学：变分原理与平均场势

Phys Rev E. 2023 Jul;108(1-1):014309. doi: 10.1103/PhysRevE.108.014309.

Application of Meta-Heuristic Algorithms for Training Neural Networks and Deep Learning Architectures: A Comprehensive Review.元启发式算法在神经网络和深度学习架构训练中的应用：全面综述。

Neural Process Lett. 2022 Oct 31:1-104. doi: 10.1007/s11063-022-11055-6.

HybridSNN: Combining Bio-Machine Strengths by Boosting Adaptive Spiking Neural Networks.HybridSNN：通过提升自适应尖峰神经网络来结合生物机器的优势。

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5841-5855. doi: 10.1109/TNNLS.2021.3131356. Epub 2023 Sep 1.

引用本文的文献

Adiabatic Energetic Annealing via Dual Single-Pixel Detection in an Optical Nonlinear Ising Machine.光学非线性伊辛机中基于双单像素检测的绝热能量退火

ACS Photonics. 2025 Apr 14;12(6):2896-2901. doi: 10.1021/acsphotonics.4c02496. eCollection 2025 Jun 18.

Eight challenges in developing theory of intelligence.发展智力理论的八大挑战。

Front Comput Neurosci. 2024 Jul 24;18:1388166. doi: 10.3389/fncom.2024.1388166. eCollection 2024.

Machine learning meets physics: A two-way street.机器学习与物理学相遇：一条双向道。

Proc Natl Acad Sci U S A. 2024 Jul 2;121(27):e2403580121. doi: 10.1073/pnas.2403580121. Epub 2024 Jun 24.

On the different regimes of stochastic gradient descent.论随机梯度下降的不同机制。

Proc Natl Acad Sci U S A. 2024 Feb 27;121(9):e2316301121. doi: 10.1073/pnas.2316301121. Epub 2024 Feb 20.

Effectiveness of Biologically Inspired Neural Network Models in Learning and Patterns Memorization.受生物启发的神经网络模型在学习和模式记忆方面的有效性。

Entropy (Basel). 2022 May 12;24(5):682. doi: 10.3390/e24050682.

PAC Bayesian Performance Guarantees for Deep (Stochastic) Networks in Medical Imaging.医学成像中深度（随机）网络的PAC贝叶斯性能保证

Med Image Comput Comput Assist Interv. 2021 Sep-Oct;12903:560-570. doi: 10.1007/978-3-030-87199-4_53. Epub 2021 Sep 21.

The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.随机梯度下降中的逆方差-平坦度关系对于找到平坦最小值至关重要。

Proc Natl Acad Sci U S A. 2021 Mar 2;118(9). doi: 10.1073/pnas.2015617118.

Statistical Physics for Medical Diagnostics: Learning, Inference, and Optimization Algorithms.用于医学诊断的统计物理学：学习、推理与优化算法

Diagnostics (Basel). 2020 Nov 19;10(11):972. doi: 10.3390/diagnostics10110972.

Archetypal landscapes for deep neural networks.典型的神经网络景观。

Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25.

Towards a fully automated surveillance of well-being status in laboratory mice using deep learning: Starting with facial expression analysis.利用深度学习实现实验室小鼠全面自动化健康监测：从面部表情分析开始。

PLoS One. 2020 Apr 15;15(4):e0228059. doi: 10.1371/journal.pone.0228059. eCollection 2020.

本文引用的文献

The backtracking survey propagation algorithm for solving random K-SAT problems.回溯式调查传播算法求解随机 K-SAT 问题。

Nat Commun. 2016 Oct 3;7:12996. doi: 10.1038/ncomms12996.

Learning may need only a few bits of synaptic precision.学习可能只需要少量的突触精度。

Phys Rev E. 2016 May;93(5):052313. doi: 10.1103/PhysRevE.93.052313. Epub 2016 May 27.

Phys Rev Lett. 2015 Sep 18;115(12):128101. doi: 10.1103/PhysRevLett.115.128101.

Deep learning.深度学习。

Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.

Origin of the computational hardness for learning with binary synapses.二元突触学习的计算硬度起源。

Phys Rev E Stat Nonlin Soft Matter Phys. 2014 Nov;90(5-1):052813. doi: 10.1103/PhysRevE.90.052813. Epub 2014 Nov 17.

Fractal free energy landscapes in structural glasses.结构玻璃中的分形自由能景观。

Nat Commun. 2014 Apr 24;5:3725. doi: 10.1038/ncomms4725.

Finding undetected protein associations in cell signaling by belief propagation.通过置信传播发现细胞信号传导中未被检测到的蛋白质关联。

Proc Natl Acad Sci U S A. 2011 Jan 11;108(2):882-7. doi: 10.1073/pnas.1004751108. Epub 2010 Dec 27.

Experience-dependent structural synaptic plasticity in the mammalian brain.哺乳动物大脑中依赖经验的结构性突触可塑性。

Nat Rev Neurosci. 2009 Sep;10(9):647-58. doi: 10.1038/nrn2699.

Locked constraint satisfaction problems.锁定约束满足问题。

Phys Rev Lett. 2008 Aug 15;101(7):078702. doi: 10.1103/PhysRevLett.101.078702.

Entropy landscape and non-Gibbs solutions in constraint satisfaction problems.约束满足问题中的熵景观与非吉布斯解

Phys Rev E Stat Nonlin Soft Matter Phys. 2008 Mar;77(3 Pt 1):031118. doi: 10.1103/PhysRevE.77.031118. Epub 2008 Mar 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验