Suppr超能文献

深度神经网络泛化差距的通用平均场上限

Universal mean-field upper bound for the generalization gap of deep neural networks.

作者信息

Ariosto S, Pacelli R, Ginelli F, Gherardi M, Rotondo P

机构信息

Dipartimento di Scienza e Alta Tecnologia and Center for Nonlinear and Complex Systems, Università degli Studi dell'Insubria, Via Valleggio 11, 22100 Como, Italy.

I.N.F.N. Sezione di Milano, Via Celoria 16, 20133 Milan, Italy.

出版信息

Phys Rev E. 2022 Jun;105(6-1):064309. doi: 10.1103/PhysRevE.105.064309.

Abstract

Modern deep neural networks (DNNs) represent a formidable challenge for theorists: according to the commonly accepted probabilistic framework that describes their performance, these architectures should overfit due to the huge number of parameters to train, but in practice they do not. Here we employ results from replica mean field theory to compute the generalization gap of machine learning models with quenched features, in the teacher-student scenario and for regression problems with quadratic loss function. Notably, this framework includes the case of DNNs where the last layer is optimized given a specific realization of the remaining weights. We show how these results-combined with ideas from statistical learning theory-provide a stringent asymptotic upper bound on the generalization gap of fully trained DNN as a function of the size of the dataset P. In particular, in the limit of large P and N_{out} (where N_{out} is the size of the last layer) and N_{out}≪P, the generalization gap approaches zero faster than 2N_{out}/P, for any choice of both architecture and teacher function. Notably, this result greatly improves existing bounds from statistical learning theory. We test our predictions on a broad range of architectures, from toy fully connected neural networks with few hidden layers to state-of-the-art deep convolutional neural networks.

摘要

现代深度神经网络(DNN)对理论家来说是一个巨大的挑战:根据描述其性能的普遍接受的概率框架,由于要训练的参数数量巨大,这些架构应该会出现过拟合,但在实际中它们并没有。在这里,我们利用副本平均场理论的结果来计算具有淬火特征的机器学习模型在师生场景下以及对于具有二次损失函数的回归问题的泛化差距。值得注意的是,这个框架包括了DNN的情况,即给定其余权重的特定实现,对最后一层进行优化。我们展示了这些结果如何与统计学习理论的思想相结合,为完全训练的DNN的泛化差距提供了一个严格的渐近上界,作为数据集大小P的函数。特别是,在P和N_out(其中N_out是最后一层的大小)很大且N_out≪P的极限情况下,对于任何架构和教师函数的选择,泛化差距比2N_out/P更快地趋近于零。值得注意的是,这个结果大大改进了统计学习理论中的现有界限。我们在广泛的架构上测试了我们的预测,从具有少量隐藏层的玩具全连接神经网络到最先进的深度卷积神经网络。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验