• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

作为样条函数的浅单变量ReLU网络:初始化、损失曲面、海森矩阵及梯度流动力学

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics.

作者信息

Sahs Justin, Pyle Ryan, Damaraju Aneel, Caro Josue Ortega, Tavaslioglu Onur, Lu Andy, Anselmi Fabio, Patel Ankit B

机构信息

Department of Neuroscience, Baylor College of Medicine, Houston, TX, United States.

Department of Electrical Engineering, Rice University, Houston, TX, United States.

出版信息

Front Artif Intell. 2022 May 11;5:889981. doi: 10.3389/frai.2022.889981. eCollection 2022.

DOI:10.3389/frai.2022.889981
PMID:35647529
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9131019/
Abstract

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.

摘要

神经网络(NNs)参数与所表示函数之间关系的不透明性阻碍了对其学习动态和归纳偏差的理解。部分原因在于神经网络参数化中固有的对称性,即多种不同的参数设置可导致相同的输出函数,这既造成了关系不明确,又产生了冗余的自由度。神经网络参数化在两种对称性下是不变的:神经元的排列以及权重和偏差参数尺度的连续变换族。我们建议对第二个对称群取商,并将ReLU神经网络重新参数化为连续分段线性样条。使用这个样条视角,我们研究了浅单变量ReLU神经网络中的学习动态,对几个令人困惑的现象有了意想不到的见解和解释。我们对损失曲面的结构形成了一个惊人简单且透明的视图,包括其临界点、不动点、海森矩阵和海森谱。我们还表明,标准的权重初始化会产生非常平坦的初始函数,并且这种平坦性与过参数化和初始权重尺度一起,决定了隐式正则化的强度和类型,这与之前的工作一致。我们的隐式正则化结果与近期工作互补,表明初始化尺度通过基于核的论证严格控制隐式正则化。总体而言,消除权重尺度对称性使我们能够更简单地证明这些结果,能够证明新的结果并获得新的见解同时提供一个更加透明和直观的图景。展望未来,我们基于商样条的方法将自然地扩展到多变量和深度设置,并且与基于核的视角一起,我们相信它将在理解神经网络的努力中发挥基础性作用。使用基于样条可视化的学习动态视频可在http://shorturl.at/tFWZ2获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/3bdeea653662/frai-05-889981-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/b160fd204c22/frai-05-889981-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/68a9bf0cfbb1/frai-05-889981-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/aaab40876a27/frai-05-889981-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/bd0a9a21ca5d/frai-05-889981-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/046bb9375554/frai-05-889981-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/ade3cbeb130b/frai-05-889981-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/fe51b57db19b/frai-05-889981-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/b4903248f2d8/frai-05-889981-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/c76a5fa083b9/frai-05-889981-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/4de1ae5606e2/frai-05-889981-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/3bdeea653662/frai-05-889981-g0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/b160fd204c22/frai-05-889981-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/68a9bf0cfbb1/frai-05-889981-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/aaab40876a27/frai-05-889981-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/bd0a9a21ca5d/frai-05-889981-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/046bb9375554/frai-05-889981-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/ade3cbeb130b/frai-05-889981-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/fe51b57db19b/frai-05-889981-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/b4903248f2d8/frai-05-889981-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/c76a5fa083b9/frai-05-889981-g0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/4de1ae5606e2/frai-05-889981-g0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6b82/9131019/3bdeea653662/frai-05-889981-g0011.jpg

相似文献

1
Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics.作为样条函数的浅单变量ReLU网络:初始化、损失曲面、海森矩阵及梯度流动力学
Front Artif Intell. 2022 May 11;5:889981. doi: 10.3389/frai.2022.889981. eCollection 2022.
2
Improved weight initialization for deep and narrow feedforward neural network.深度窄前馈神经网络的改进权重初始化。
Neural Netw. 2024 Aug;176:106362. doi: 10.1016/j.neunet.2024.106362. Epub 2024 May 3.
3
A comparison of deep networks with ReLU activation function and linear spline-type methods.ReLU 激活函数的深度网络与线性样条型方法的比较。
Neural Netw. 2019 Feb;110:232-242. doi: 10.1016/j.neunet.2018.11.005. Epub 2018 Dec 4.
4
Learning in the machine: The symmetries of the deep learning channel.机器学习:深度学习通道的对称性。
Neural Netw. 2017 Nov;95:110-133. doi: 10.1016/j.neunet.2017.08.008. Epub 2017 Sep 5.
5
ReLU Networks Are Universal Approximators via Piecewise Linear or Constant Functions.ReLU 网络通过分段线性或常数函数实现通用逼近。
Neural Comput. 2020 Nov;32(11):2249-2278. doi: 10.1162/neco_a_01316. Epub 2020 Sep 18.
6
Provable Multi-Task Representation Learning by Two-Layer ReLU Neural Networks.基于两层ReLU神经网络的可证明多任务表示学习
Proc Mach Learn Res. 2024 Jul;235:9292-9345.
7
Magnitude and angle dynamics in training single ReLU neurons.训练单个ReLU神经元时的幅度和角度动态
Neural Netw. 2024 Oct;178:106435. doi: 10.1016/j.neunet.2024.106435. Epub 2024 Jun 22.
8
Stochastic Gradient Descent Introduces an Effective Landscape-Dependent Regularization Favoring Flat Solutions.随机梯度下降法引入了一种有效的依赖于景观的正则化方法,有利于平坦解。
Phys Rev Lett. 2023 Jun 9;130(23):237101. doi: 10.1103/PhysRevLett.130.237101.
9
Optimal approximation of piecewise smooth functions using deep ReLU neural networks.使用深度 ReLU 神经网络对分段光滑函数进行最优逼近。
Neural Netw. 2018 Dec;108:296-330. doi: 10.1016/j.neunet.2018.08.019. Epub 2018 Sep 7.
10
Theoretical issues in deep networks.深度网络中的理论问题。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30039-30045. doi: 10.1073/pnas.1907369117. Epub 2020 Jun 9.

引用本文的文献

1
Translational symmetry in convolutions with localized kernels causes an implicit bias toward high frequency adversarial examples.具有局部化内核的卷积中的平移对称性会导致对高频对抗样本的隐含偏差。
Front Comput Neurosci. 2024 Jun 20;18:1387077. doi: 10.3389/fncom.2024.1387077. eCollection 2024.
2
Two-Argument Activation Functions Learn Soft XOR Operations Like Cortical Neurons.双参数激活函数能像皮层神经元一样学习软异或运算。
IEEE Access. 2022;10:58071-58080. doi: 10.1109/access.2022.3178951. Epub 2022 May 30.
3
Multiomics, artificial intelligence, and precision medicine in perinatology.

本文引用的文献

1
Emergence of Lie Symmetries in Functional Architectures Learned by CNNs.卷积神经网络学习的功能架构中李对称的出现。
Front Comput Neurosci. 2021 Nov 22;15:694505. doi: 10.3389/fncom.2021.694505. eCollection 2021.
2
Symmetry-aware reservoir computing.对称感知储层计算。
Phys Rev E. 2021 Oct;104(4-2):045307. doi: 10.1103/PhysRevE.104.045307.
3
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
围产学中的多组学、人工智能和精准医学。
Pediatr Res. 2023 Jan;93(2):308-315. doi: 10.1038/s41390-022-02181-x. Epub 2022 Jul 8.
4
Domain-driven models yield better predictions at lower cost than reservoir computers in Lorenz systems.相比于 Lorenz 系统中的储层计算机,域驱动模型以更低的成本产生更好的预测结果。
Philos Trans A Math Phys Eng Sci. 2021 Apr 5;379(2194):20200246. doi: 10.1098/rsta.2020.0246. Epub 2021 Feb 15.
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.
4
A mean field view of the landscape of two-layer neural networks.两层神经网络景观的平均场观点。
Proc Natl Acad Sci U S A. 2018 Aug 14;115(33):E7665-E7671. doi: 10.1073/pnas.1806579115. Epub 2018 Jul 27.