Suppr超能文献

预测使用噪声梯度训练的有限深度神经网络的输出。

Predicting the outputs of finite deep neural networks trained with noisy gradients.

作者信息

Naveh Gadi, Ben David Oded, Sompolinsky Haim, Ringel Zohar

机构信息

Racah Institute of Physics, Hebrew University, Jerusalem 91904, Israel.

Edmond and Lily Safra Center for Brain Sciences, Hebrew University, Jerusalem 91904, Israel.

出版信息

Phys Rev E. 2021 Dec;104(6-1):064301. doi: 10.1103/PhysRevE.104.064301.

Abstract

A recent line of works studied wide deep neural networks (DNNs) by approximating them as Gaussian processes (GPs). A DNN trained with gradient flow was shown to map to a GP governed by the neural tangent kernel (NTK), whereas earlier works showed that a DNN with an i.i.d. prior over its weights maps to the so-called neural network Gaussian process (NNGP). Here we consider a DNN training protocol, involving noise, weight decay, and finite width, whose outcome corresponds to a certain non-Gaussian stochastic process. An analytical framework is then introduced to analyze this non-Gaussian process, whose deviation from a GP is controlled by the finite width. Our contribution is threefold: (i) In the infinite width limit, we establish a correspondence between DNNs trained with noisy gradients and the NNGP, not the NTK. (ii) We provide a general analytical form for the finite width correction (FWC) for DNNs with arbitrary activation functions and depth and use it to predict the outputs of empirical finite networks with high accuracy. Analyzing the FWC behavior as a function of n, the training set size, we find that it is negligible for both the very small n regime, and, surprisingly, for the large n regime [where the GP error scales as O(1/n)]. (iii) We flesh out algebraically how these FWCs can improve the performance of finite convolutional neural networks (CNNs) relative to their GP counterparts on image classification tasks.

摘要

最近一系列工作通过将宽深度神经网络(DNN)近似为高斯过程(GP)来对其进行研究。结果表明,使用梯度流训练的DNN映射到由神经切线核(NTK)控制的GP,而早期工作表明,权重具有独立同分布先验的DNN映射到所谓的神经网络高斯过程(NNGP)。在这里,我们考虑一种DNN训练协议,该协议涉及噪声、权重衰减和有限宽度,其结果对应于某个非高斯随机过程。然后引入一个分析框架来分析这个非高斯过程,其与GP的偏差由有限宽度控制。我们的贡献有三个方面:(i)在无限宽度极限下,我们建立了用有噪声梯度训练的DNN与NNGP之间的对应关系,而不是与NTK的对应关系。(ii)我们为具有任意激活函数和深度的DNN的有限宽度校正(FWC)提供了一般分析形式,并用它来高精度预测经验有限网络的输出。分析FWC行为作为训练集大小n的函数,我们发现对于非常小的n情况以及令人惊讶的是对于大n情况(其中GP误差按O(1/n)缩放),FWC都可以忽略不计。(iii)我们从代数角度详细阐述了这些FWC如何相对于其GP对应物在图像分类任务上提高有限卷积神经网络(CNN)的性能。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验