Mingard Chris, Rees Henry, Valle-Pérez Guillermo, Louis Ard A
Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK.
Physical and Theoretical Chemistry Laboratory, University of Oxford, Oxford, UK.
Nat Commun. 2025 Jan 14;16(1):220. doi: 10.1038/s41467-024-54813-x.
The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam's razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.
过参数化深度神经网络(DNN)的卓越性能必定源于网络架构、训练算法和数据结构之间的相互作用。为了在监督学习中解开这三个组成部分,我们基于DNN所表达的函数应用一种贝叶斯图景。函数的先验由网络架构决定,我们通过利用有序和混沌状态之间的转变来改变网络架构。对于布尔函数分类,我们使用数据上函数的误差谱来近似似然。将此与先验相结合,可对后验做出准确预测,该预测是针对使用随机梯度下降训练的DNN进行测量的。此分析表明,结构化数据,连同对(柯尔莫哥洛夫)简单函数的一种特定的类似奥卡姆剃刀的归纳偏差,这种偏差恰好抵消了函数数量随复杂度呈指数增长的情况,是DNN成功的关键。