预测使用噪声梯度训练的有限深度神经网络的输出。

Predicting the outputs of finite deep neural networks trained with noisy gradients.

作者信息

Naveh Gadi, Ben David Oded, Sompolinsky Haim, Ringel Zohar

机构信息

Racah Institute of Physics, Hebrew University, Jerusalem 91904, Israel.

Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem 91904, Israel.

出版信息

Phys Rev E. 2021 Dec;104(6-1):064301. doi: 10.1103/PhysRevE.104.064301.

DOI:10.1103/PhysRevE.104.064301

PMID:35030925

Abstract

A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian processes (GPs). A DNN trained with gradient flow was shown to map to a GP governed by the neural tangent kernel (NTK), whereas earlier works showed that a DNN with an i.i.d. prior over its weights maps to the so-called neural network Gaussian process (NNGP). Here we consider a DNN training protocol, involving noise, weight decay, and finite width, whose outcome corresponds to a certain non-Gaussian stochastic process. An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width. Our contribution is threefold: (i) In the infinite width limit, we establish a correspondence between DNNs trained with noisy gradients and the NNGP, not the NTK. (ii) We provide a general analytical form for the finite width correction (FWC) for DNNs with arbitrary activation functions and depth and use it to predict the outputs of empirical finite networks with high accuracy. Analyzing the FWC behavior as a function of n, the training set size, we find that it is negligible for both the very small n regime, and, surprisingly, for the large n regime [where the GP error scales as O(1/n)]. (iii) We flesh out algebraically how these FWCs can improve the performance of finite convolutional neural networks (CNNs) relative to their GP counterparts on image classification tasks.

摘要

最近一系列工作通过将宽深度神经网络（DNN）近似为高斯过程（GP）来对其进行研究。结果表明，使用梯度流训练的DNN映射到由神经切线核（NTK）控制的GP，而早期工作表明，权重具有独立同分布先验的DNN映射到所谓的神经网络高斯过程（NNGP）。在这里，我们考虑一种DNN训练协议，该协议涉及噪声、权重衰减和有限宽度，其结果对应于某个非高斯随机过程。然后引入一个分析框架来分析这个非高斯过程，其与GP的偏差由有限宽度控制。我们的贡献有三个方面：（i）在无限宽度极限下，我们建立了用有噪声梯度训练的DNN与NNGP之间的对应关系，而不是与NTK的对应关系。（ii）我们为具有任意激活函数和深度的DNN的有限宽度校正（FWC）提供了一般分析形式，并用它来高精度预测经验有限网络的输出。分析FWC行为作为训练集大小n的函数，我们发现对于非常小的n情况以及令人惊讶的是对于大n情况（其中GP误差按O(1/n)缩放），FWC都可以忽略不计。（iii）我们从代数角度详细阐述了这些FWC如何相对于其GP对应物在图像分类任务上提高有限卷积神经网络（CNN）的性能。

相似文献

Predicting the outputs of finite deep neural networks trained with noisy gradients.

Phys Rev E. 2021 Dec;104(6-1):064301. doi: 10.1103/PhysRevE.104.064301.

Separation of scales and a thermodynamic description of feature learning in some CNNs.

Nat Commun. 2023 Feb 17;14(1):908. doi: 10.1038/s41467-023-36361-y.

DeepCorrect: Correcting DNN Models Against Image Distortions.

IEEE Trans Image Process. 2019 Dec;28(12):6022-6034. doi: 10.1109/TIP.2019.2924172. Epub 2019 Jun 26.

Noise-trained deep neural networks effectively predict human vision and its neural responses to challenging images.

PLoS Biol. 2021 Dec 9;19(12):e3001418. doi: 10.1371/journal.pbio.3001418. eCollection 2021 Dec.

Deep learning as Ricci flow.

Sci Rep. 2024 Oct 8;14(1):23383. doi: 10.1038/s41598-024-74045-9.

Synergistic Integration of Deep Neural Networks and Finite Element Method with Applications of Nonlinear Large Deformation Biomechanics.

Comput Methods Appl Mech Eng. 2023 Nov 1;416. doi: 10.1016/j.cma.2023.116347. Epub 2023 Aug 22.

fMRI volume classification using a 3D convolutional neural network robust to shifted and scaled neuronal activations.

Neuroimage. 2020 Dec;223:117328. doi: 10.1016/j.neuroimage.2020.117328. Epub 2020 Sep 5.

Deep Convolutional Neural Networks for large-scale speech tasks.

Neural Netw. 2015 Apr;64:39-48. doi: 10.1016/j.neunet.2014.08.005. Epub 2014 Sep 16.

Simple, fast, and flexible framework for matrix completion with infinite width neural networks.

Proc Natl Acad Sci U S A. 2022 Apr 19;119(16):e2115064119. doi: 10.1073/pnas.2115064119. Epub 2022 Apr 11.

Improved Linear Convergence of Training CNNs With Generalizability Guarantees: A One-Hidden-Layer Case.

IEEE Trans Neural Netw Learn Syst. 2021 Jun;32(6):2622-2635. doi: 10.1109/TNNLS.2020.3007399. Epub 2021 Jun 2.

引用本文的文献

Coding schemes in neural networks learning classification tasks.

Nat Commun. 2025 Apr 9;16(1):3354. doi: 10.1038/s41467-025-58276-6.

Separation of scales and a thermodynamic description of feature learning in some CNNs.

Nat Commun. 2023 Feb 17;14(1):908. doi: 10.1038/s41467-023-36361-y.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

预测使用噪声梯度训练的有限深度神经网络的输出。

Predicting the outputs of finite deep neural networks trained with noisy gradients.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献