Suppr超能文献

从数据分布和神经网络平滑度的角度量化深度学习中的泛化误差。

Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.

机构信息

LSEC, ICMSEC, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China; School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.

Division of Applied Mathematics, Brown University, Providence, RI 02912, USA.

出版信息

Neural Netw. 2020 Oct;130:85-99. doi: 10.1016/j.neunet.2020.06.024. Epub 2020 Jul 3.

Abstract

The accuracy of deep learning, i.e., deep neural networks, can be characterized by dividing the total error into three main types: approximation error, optimization error, and generalization error. Whereas there are some satisfactory answers to the problems of approximation and optimization, much less is known about the theory of generalization. Most existing theoretical works for generalization fail to explain the performance of neural networks in practice. To derive a meaningful bound, we study the generalization error of neural networks for classification problems in terms of data distribution and neural network smoothness. We introduce the cover complexity (CC) to measure the difficulty of learning a data set and the inverse of the modulus of continuity to quantify neural network smoothness. A quantitative bound for expected accuracy/error is derived by considering both the CC and neural network smoothness. Although most of the analysis is general and not specific to neural networks, we validate our theoretical assumptions and results numerically for neural networks by several data sets of images. The numerical results confirm that the expected error of trained networks scaled with the square root of the number of classes has a linear relationship with respect to the CC. We also observe a clear consistency between test loss and neural network smoothness during the training process. In addition, we demonstrate empirically that the neural network smoothness decreases when the network size increases whereas the smoothness is insensitive to training dataset size.

摘要

深度学习的准确性,即深度神经网络,可以通过将总误差分为三种主要类型来描述:逼近误差、优化误差和泛化误差。虽然对于逼近和优化问题已经有了一些令人满意的答案,但对于泛化理论的了解却少得多。大多数现有的泛化理论工作都无法解释神经网络在实践中的性能。为了得到一个有意义的界,我们从数据分布和神经网络平滑度的角度研究了分类问题中神经网络的泛化误差。我们引入了覆盖复杂度(CC)来衡量学习数据集的难度,并用模的倒数来量化神经网络的平滑度。通过同时考虑 CC 和神经网络的平滑度,我们推导出了一个关于期望精度/误差的定量界。虽然大部分分析都是通用的,而不是针对神经网络的,但我们通过几个图像数据集对神经网络的理论假设和结果进行了数值验证。数值结果证实,训练后的网络的期望误差与类的数量的平方根呈线性关系。我们还在训练过程中观察到测试损失和神经网络平滑度之间的明显一致性。此外,我们通过实验证明,当网络规模增加时,神经网络的平滑度会降低,而平滑度对训练数据集的大小不敏感。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验