• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

深入挖掘,更好地泛化:深度学习的信息论视角

Going Deeper, Generalizing Better: An Information-Theoretic View for Deep Learning.

作者信息

Zhang Jingwei, Liu Tongliang, Tao Dacheng

出版信息

IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16683-16695. doi: 10.1109/TNNLS.2023.3297113. Epub 2024 Oct 29.

DOI:10.1109/TNNLS.2023.3297113
PMID:37585328
Abstract

Deep learning has transformed computer vision, natural language processing, and speech recognition. However, two critical questions remain obscure: 1) why do deep neural networks (DNNs) generalize better than shallow networks and 2) does it always hold that a deeper network leads to better performance? In this article, we first show that the expected generalization error of neural networks (NNs) can be upper bounded by the mutual information between the learned features in the last hidden layer and the parameters of the output layer. This bound further implies that as the number of layers increases in the network, the expected generalization error will decrease under mild conditions. Layers with strict information loss, such as the convolutional or pooling layers, reduce the generalization error for the whole network; this answers the first question. However, algorithms with zero expected generalization error do not imply a small test error. This is because the expected training error is large when the information for fitting the data is lost as the number of layers increases. This suggests that the claim "the deeper the better" is conditioned on a small training error. Finally, we show that deep learning satisfies a weak notion of stability and provides some generalization error bounds for noisy stochastic gradient decent (SGD) and binary classification in DNNs.

摘要

深度学习已经改变了计算机视觉、自然语言处理和语音识别。然而,两个关键问题仍然模糊不清:1)为什么深度神经网络(DNN)比浅层网络具有更好的泛化能力,以及2)深度网络是否总是能带来更好的性能?在本文中,我们首先表明神经网络(NN)的期望泛化误差可以由最后一个隐藏层中学习到的特征与输出层参数之间的互信息来上界约束。这个约束进一步意味着,在温和条件下,随着网络层数的增加,期望泛化误差会减小。具有严格信息损失的层,如卷积层或池化层,会降低整个网络的泛化误差;这回答了第一个问题。然而,期望泛化误差为零的算法并不意味着测试误差小。这是因为当随着层数增加拟合数据的信息丢失时,期望训练误差会很大。这表明“越深越好”的说法是以小训练误差为条件的。最后,我们表明深度学习满足一种弱稳定性概念,并为深度神经网络中的噪声随机梯度下降(SGD)和二分类提供了一些泛化误差界。

相似文献

1
Going Deeper, Generalizing Better: An Information-Theoretic View for Deep Learning.深入挖掘,更好地泛化:深度学习的信息论视角
IEEE Trans Neural Netw Learn Syst. 2024 Nov;35(11):16683-16695. doi: 10.1109/TNNLS.2023.3297113. Epub 2024 Oct 29.
2
An Optimal Transport Analysis on Generalization in Deep Learning.深度学习中的泛化的最优传输分析。
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):2842-2853. doi: 10.1109/TNNLS.2021.3109942. Epub 2023 Jun 1.
3
Deep convolutional neural network and IoT technology for healthcare.用于医疗保健的深度卷积神经网络和物联网技术。
Digit Health. 2024 Jan 17;10:20552076231220123. doi: 10.1177/20552076231220123. eCollection 2024 Jan-Dec.
4
Upper bound of the expected training error of neural network regression for a Gaussian noise sequence.高斯噪声序列下神经网络回归预期训练误差的上界。
Neural Netw. 2001 Dec;14(10):1419-29. doi: 10.1016/s0893-6080(01)00122-8.
5
Human Behavior Recognition in Outdoor Sports Based on the Local Error Model and Convolutional Neural Network.基于局部误差模型和卷积神经网络的户外运动中的人体行为识别
Comput Intell Neurosci. 2022 Jun 28;2022:6988525. doi: 10.1155/2022/6988525. eCollection 2022.
6
Quantifying the generalization error in deep learning in terms of data distribution and neural network smoothness.从数据分布和神经网络平滑度的角度量化深度学习中的泛化误差。
Neural Netw. 2020 Oct;130:85-99. doi: 10.1016/j.neunet.2020.06.024. Epub 2020 Jul 3.
7
Universal mean-field upper bound for the generalization gap of deep neural networks.深度神经网络泛化差距的通用平均场上限
Phys Rev E. 2022 Jun;105(6-1):064309. doi: 10.1103/PhysRevE.105.064309.
8
Accelerating DNN Training Through Selective Localized Learning.通过选择性局部学习加速深度神经网络训练
Front Neurosci. 2022 Jan 11;15:759807. doi: 10.3389/fnins.2021.759807. eCollection 2021.
9
MABAL: a Novel Deep-Learning Architecture for Machine-Assisted Bone Age Labeling.MABAL:一种用于机器辅助骨龄标注的新型深度学习架构。
J Digit Imaging. 2018 Aug;31(4):513-519. doi: 10.1007/s10278-018-0053-3.
10
Causal importance of low-level feature selectivity for generalization in image recognition.低水平特征选择对图像识别泛化的因果重要性。
Neural Netw. 2020 May;125:185-193. doi: 10.1016/j.neunet.2020.02.009. Epub 2020 Feb 24.

引用本文的文献

1
An Automated Vertebrae Localization, Segmentation, and Osteoporotic Compression Fracture Detection Pipeline for Computed Tomographic Imaging.一种用于计算机断层成像的自动椎体定位、分割和骨质疏松性压缩性骨折检测流水线。
J Imaging Inform Med. 2024 Oct;37(5):2428-2443. doi: 10.1007/s10278-024-01135-5. Epub 2024 May 8.