• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

典型的神经网络景观。

Archetypal landscapes for deep neural networks.

机构信息

Department of Physics, University of Cambridge, Cambridge CB3 0HE, United Kingdom;

Department of Physics, University of Cambridge, Cambridge CB3 0HE, United Kingdom.

出版信息

Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25.

DOI:10.1073/pnas.1919995117
PMID:32843349
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7486703/
Abstract

The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.

摘要

深度神经网络(DNN)的预测能力不断发展,达到了令人印象深刻的水平。然而,目前尚不清楚 DNN 的训练过程如何成功地找到能够产生高维、非凸损失函数的良好结果的参数。特别是,我们希望了解为什么简单的优化方案,如随机梯度下降,不会最终陷入高损失值的局部最小值,而这些局部最小值无法产生有用的预测。我们通过描述损失函数景观(LFL)的局部最小值和过渡状态及其连通性来解释 DNN 的可优化性。我们表明,在浅层网络或数据丰富的极限下,DNN 的 LFL 是漏斗形的,因此很容易优化。至关重要的是,在相反的低数据/深层极限下,尽管最小值的数量增加了,但景观的特征是具有相似损失值的许多最小值之间存在低势垒。这种组织与结构玻璃形成体的层次化景观不同,解释了为什么机器学习社区常用的最小化程序可以成功地穿越 LFL 并达到低势垒的解决方案。

相似文献

1
Archetypal landscapes for deep neural networks.典型的神经网络景观。
Proc Natl Acad Sci U S A. 2020 Sep 8;117(36):21857-21864. doi: 10.1073/pnas.1919995117. Epub 2020 Aug 25.
2
Geometry of Energy Landscapes and the Optimizability of Deep Neural Networks.能量景观的几何形状与深度神经网络的可优化性。
Phys Rev Lett. 2020 Mar 13;124(10):108301. doi: 10.1103/PhysRevLett.124.108301.
3
Anomalous diffusion dynamics of learning in deep neural networks.深度学习网络中学习的异常扩散动力学。
Neural Netw. 2022 May;149:18-28. doi: 10.1016/j.neunet.2022.01.019. Epub 2022 Feb 3.
4
Shaping the learning landscape in neural networks around wide flat minima.围绕宽而平坦的极小值塑造神经网络的学习景观。
Proc Natl Acad Sci U S A. 2020 Jan 7;117(1):161-170. doi: 10.1073/pnas.1908636117. Epub 2019 Dec 23.
5
A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.
6
Unveiling the Structure of Wide Flat Minima in Neural Networks.揭示神经网络中的宽平坦极小值结构。
Phys Rev Lett. 2021 Dec 31;127(27):278301. doi: 10.1103/PhysRevLett.127.278301.
7
Energy landscapes of resting-state brain networks.静息态脑网络的能量景观。
Front Neuroinform. 2014 Feb 25;8:12. doi: 10.3389/fninf.2014.00012. eCollection 2014.
8
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.随机梯度下降中的逆方差-平坦度关系对于找到平坦最小值至关重要。
Proc Natl Acad Sci U S A. 2021 Mar 2;118(9). doi: 10.1073/pnas.2015617118.
9
Machine learning landscapes and predictions for patient outcomes.机器学习格局与患者预后预测。
R Soc Open Sci. 2017 Jul 26;4(7):170175. doi: 10.1098/rsos.170175. eCollection 2017 Jul.
10
Loss surface of XOR artificial neural networks.异或人工神经网络的损失曲面。
Phys Rev E. 2018 May;97(5-1):052307. doi: 10.1103/PhysRevE.97.052307.

引用本文的文献

1
Optimization on multifractal loss landscapes explains a diverse range of geometrical and dynamical properties of deep learning.多重分形损失景观的优化解释了深度学习的各种几何和动力学特性。
Nat Commun. 2025 Apr 5;16(1):3252. doi: 10.1038/s41467-025-58532-9.
2
Design principles for energy transfer in the photosystem II supercomplex from kinetic transition networks.基于动力学转变网络的光合系统 II 超复合体中能量传递的设计原理。
Nat Commun. 2024 Oct 9;15(1):8763. doi: 10.1038/s41467-024-53138-z.
3
Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design.用于聚合物材料的机器学习与粗粒度分子模拟的整合:物理理解与分子设计
Front Chem. 2022 Jan 24;9:820417. doi: 10.3389/fchem.2021.820417. eCollection 2021.

本文引用的文献

1
A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.一种通过自我对弈掌握国际象棋、将棋和围棋的通用强化学习算法。
Science. 2018 Dec 7;362(6419):1140-1144. doi: 10.1126/science.aar6404.
2
A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.
3
Loss surface of XOR artificial neural networks.异或人工神经网络的损失曲面。
Phys Rev E. 2018 May;97(5-1):052307. doi: 10.1103/PhysRevE.97.052307.
4
Exploring Energy Landscapes.探索能量景观。
Annu Rev Phys Chem. 2018 Apr 20;69:401-425. doi: 10.1146/annurev-physchem-050317-021219.
5
Pathways for diffusion in the potential energy landscape of the network glass former SiO.网络玻璃形成体 SiO 势能景观中的扩散途径。
J Chem Phys. 2017 Oct 21;147(15):152726. doi: 10.1063/1.5005924.
6
Exploring biomolecular energy landscapes.探索生物分子能量景观。
Chem Commun (Camb). 2017 Jun 27;53(52):6974-6988. doi: 10.1039/c7cc02413d.
7
Defining and quantifying frustration in the energy landscape: Applications to atomic and molecular clusters, biomolecules, jammed and glassy systems.定义和量化能量景观中的挫折感:在原子和分子团簇、生物分子、堆积和玻璃态系统中的应用。
J Chem Phys. 2017 Mar 28;146(12):124103. doi: 10.1063/1.4977794.
8
Energy landscapes for machine learning.机器学习的能量景观
Phys Chem Chem Phys. 2017 May 24;19(20):12585-12603. doi: 10.1039/c7cp01108c.
9
Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes.学习神经网络的不合理有效性:从可达状态、稳健集成到基本算法方案
Proc Natl Acad Sci U S A. 2016 Nov 29;113(48):E7655-E7662. doi: 10.1073/pnas.1608103113. Epub 2016 Nov 15.
10
Dynamics of a molecular glass former: Energy landscapes for diffusion in ortho-terphenyl.分子玻璃形成体的动力学:邻三联苯中扩散的能量景观
J Chem Phys. 2016 Jul 14;145(2):024505. doi: 10.1063/1.4954324.