Suppr超能文献

深度神经网络具有内在的奥卡姆剃刀原理。

Deep neural networks have an inbuilt Occam's razor.

作者信息

Mingard Chris, Rees Henry, Valle-Pérez Guillermo, Louis Ard A

机构信息

Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, UK.

Physical and Theoretical Chemistry Laboratory, University of Oxford, Oxford, UK.

出版信息

Nat Commun. 2025 Jan 14;16(1):220. doi: 10.1038/s41467-024-54813-x.

Abstract

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components for supervised learning, we apply a Bayesian picture based on the functions expressed by a DNN. The prior over functions is determined by the network architecture, which we vary by exploiting a transition between ordered and chaotic regimes. For Boolean function classification, we approximate the likelihood using the error spectrum of functions on data. Combining this with the prior yields an accurate prediction for the posterior, measured for DNNs trained with stochastic gradient descent. This analysis shows that structured data, together with a specific Occam's razor-like inductive bias towards (Kolmogorov) simple functions that exactly counteracts the exponential growth of the number of functions with complexity, is a key to the success of DNNs.

摘要

过参数化深度神经网络(DNN)的卓越性能必定源于网络架构、训练算法和数据结构之间的相互作用。为了在监督学习中解开这三个组成部分,我们基于DNN所表达的函数应用一种贝叶斯图景。函数的先验由网络架构决定,我们通过利用有序和混沌状态之间的转变来改变网络架构。对于布尔函数分类,我们使用数据上函数的误差谱来近似似然。将此与先验相结合,可对后验做出准确预测,该预测是针对使用随机梯度下降训练的DNN进行测量的。此分析表明,结构化数据,连同对(柯尔莫哥洛夫)简单函数的一种特定的类似奥卡姆剃刀的归纳偏差,这种偏差恰好抵消了函数数量随复杂度呈指数增长的情况,是DNN成功的关键。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1710/11733143/5d5c8e02de27/41467_2024_54813_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验