Suppr超能文献

高维回归的信息瓶颈理论:相关性、效率与最优性

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality.

作者信息

Ngampruetikorn Vudtiwat, Schwab David J

机构信息

Initiative for the Theoretical Sciences, The Graduate Center, CUNY.

出版信息

Adv Neural Inf Process Syst. 2022 Dec;35:9784-9796.

Abstract

Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the unknown generative models. We solve this optimization to obtain the information content of optimal algorithms for a linear regression problem and compare it to that of randomized ridge regression. Our results demonstrate the fundamental trade-off between residual and relevant information and characterize the relative information efficiency of randomized regression with respect to optimal algorithms. Finally, using results from random matrix theory, we reveal the information complexity of learning a linear map in high dimensions and unveil information-theoretic analogs of double and multiple descent phenomena.

摘要

避免过拟合是机器学习中的核心挑战,然而许多大型神经网络很容易实现零训练损失。这种令人困惑的矛盾需要新的方法来研究过拟合。在这里,我们通过残差信息来量化过拟合,残差信息定义为拟合模型中编码训练数据噪声的比特。信息高效学习算法在最大化相关比特的同时最小化残差信息,这些相关比特可预测未知的生成模型。我们求解此优化问题以获得线性回归问题的最优算法的信息内容,并将其与随机岭回归的信息内容进行比较。我们的结果展示了残差信息与相关信息之间的基本权衡,并刻画了随机回归相对于最优算法的相对信息效率。最后,利用随机矩阵理论的结果,我们揭示了高维中学习线性映射的信息复杂性,并揭示了双重和多重下降现象的信息论类似物。

相似文献

2
Reconciling modern machine-learning practice and the classical bias-variance trade-off.调和现代机器学习实践与经典偏差-方差权衡。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.
10
The Deterministic Information Bottleneck.确定性信息瓶颈
Neural Comput. 2017 Jun;29(6):1611-1630. doi: 10.1162/NECO_a_00961. Epub 2017 Apr 14.

本文引用的文献

1
Perturbation Theory for the Information Bottleneck.信息瓶颈的微扰理论
Adv Neural Inf Process Syst. 2021 Dec;34:21008-21018.
4
Trading bits in the readout from a genetic network.从遗传网络的读出中交易位。
Proc Natl Acad Sci U S A. 2021 Nov 16;118(46). doi: 10.1073/pnas.2109011118.
5
Relevance in the Renormalization Group and in Information Theory.重整化群与信息论中的相关性。
Phys Rev Lett. 2021 Jun 18;126(24):240601. doi: 10.1103/PhysRevLett.126.240601.
7
Benign overfitting in linear regression.线性回归中的良性过拟合。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30063-30070. doi: 10.1073/pnas.1907378117. Epub 2020 Apr 24.
8
Reconciling modern machine-learning practice and the classical bias-variance trade-off.调和现代机器学习实践与经典偏差-方差权衡。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.
9
The Information Bottleneck and Geometric Clustering.信息瓶颈与几何聚类
Neural Comput. 2019 Mar;31(3):596-612. doi: 10.1162/neco_a_01136. Epub 2018 Oct 12.
10
Information Dropout: Learning Optimal Representations Through Noisy Computation.信息丢失:通过噪声计算学习最优表示
IEEE Trans Pattern Anal Mach Intell. 2018 Dec;40(12):2897-2905. doi: 10.1109/TPAMI.2017.2784440. Epub 2018 Jan 10.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验