• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

记忆而不过度拟合:超参数化模型中的偏差、方差和插值

Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models.

作者信息

Rocks Jason W, Mehta Pankaj

机构信息

Department of Physics, Boston University, Boston, Massachusetts 02215, USA.

Faculty of Computing and Data Sciences, Boston University, Boston, Massachusetts 02215, USA.

出版信息

Phys Rev Res. 2022 Mar-May;4(1). doi: 10.1103/physrevresearch.4.013201. Epub 2022 Mar 15.

DOI:10.1103/physrevresearch.4.013201
PMID:36713351
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9879296/
Abstract

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern Deep Learning methods flout this dogma, achieving state-of-the-art performance using "over-parameterized models" where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in over-parameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of over-parameterization (linear regression and two-layer neural networks with nonlinear data distributions), allowing us to disentangle properties stemming from the model architecture and random sampling of data. In both models, increasing the number of fit parameters leads to a phase transition where the training error goes to zero and the test error diverges as a result of the variance (while the bias remains finite). Beyond this threshold, the test error of the two-layer neural network decreases due to a monotonic decrease in the bias and variance in contrast with the classical bias-variance trade-off. We also show that in contrast with classical intuition, over-parameterized models can overfit even in the absence of noise and exhibit bias even if the student and teacher models match. We synthesize these results to construct a holistic understanding of generalization error and the bias-variance trade-off in over-parameterized models and relate our results to random matrix theory.

摘要

偏差-方差权衡是监督学习中的核心概念。在经典统计学中,增加模型的复杂度(例如参数数量)会降低偏差,但也会增加方差。直到最近,人们普遍认为在中间模型复杂度下可实现最优性能,这种复杂度能在偏差和方差之间取得平衡。现代深度学习方法却无视这一教条,使用“过参数化模型”实现了最优性能,其中拟合参数的数量足够大,足以完美拟合训练数据。因此,理解过参数化模型中的偏差和方差已成为机器学习中的一个基本问题。在这里,我们使用统计物理学方法,推导出过参数化的两个最小模型(线性回归和具有非线性数据分布的两层神经网络)中偏差和方差的解析表达式,使我们能够区分源于模型架构和数据随机采样的属性。在这两个模型中,增加拟合参数的数量会导致一个相变,即训练误差趋于零,而由于方差(偏差保持有限)测试误差发散。超过这个阈值,两层神经网络的测试误差会因偏差和方差的单调减小而降低,这与经典的偏差-方差权衡不同。我们还表明,与经典直觉相反,过参数化模型即使在没有噪声的情况下也可能过拟合,并且即使学生模型和教师模型匹配也会表现出偏差。我们综合这些结果,对过参数化模型中的泛化误差和偏差-方差权衡形成整体理解,并将我们的结果与随机矩阵理论联系起来。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/70652f297a8b/nihms-1866950-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/985c1bbc2a65/nihms-1866950-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/5a03b76b2bdb/nihms-1866950-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/231c7771c6f2/nihms-1866950-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/fb77fa795014/nihms-1866950-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/e0b0a7363e1d/nihms-1866950-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/70652f297a8b/nihms-1866950-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/985c1bbc2a65/nihms-1866950-f0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/5a03b76b2bdb/nihms-1866950-f0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/231c7771c6f2/nihms-1866950-f0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/fb77fa795014/nihms-1866950-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/e0b0a7363e1d/nihms-1866950-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/336c/9879296/70652f297a8b/nihms-1866950-f0006.jpg

相似文献

1
Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models.记忆而不过度拟合:超参数化模型中的偏差、方差和插值
Phys Rev Res. 2022 Mar-May;4(1). doi: 10.1103/physrevresearch.4.013201. Epub 2022 Mar 15.
2
Bias-variance decomposition of overparameterized regression with random linear features.具有随机线性特征的过参数化回归的偏差-方差分解
Phys Rev E. 2022 Aug;106(2-2):025304. doi: 10.1103/PhysRevE.106.025304.
3
Reconciling modern machine-learning practice and the classical bias-variance trade-off.调和现代机器学习实践与经典偏差-方差权衡。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.
4
Understanding Double Descent Using VC-Theoretical Framework.使用VC理论框架理解双重下降
IEEE Trans Neural Netw Learn Syst. 2024 Dec;35(12):18838-18847. doi: 10.1109/TNNLS.2024.3388873. Epub 2024 Dec 2.
5
Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks.谱偏差和任务模型对齐解释了核回归和无限宽神经网络中的泛化。
Nat Commun. 2021 May 18;12(1):2914. doi: 10.1038/s41467-021-23103-1.
6
Learning through atypical phase transitions in overparameterized neural networks.通过过参数化神经网络中的非典型相变进行学习。
Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.
7
Experimental Design for Overparameterized Learning With Application to Single Shot Deep Active Learning.用于超参数化学习的实验设计及其在单阶段深度主动学习中的应用
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):11766-11777. doi: 10.1109/TPAMI.2023.3287042. Epub 2023 Sep 5.
8
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
9
Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup.师生模式下两层神经网络的随机梯度下降动力学
J Stat Mech. 2020 Dec;2020(12):124010. doi: 10.1088/1742-5468/abc61e. Epub 2020 Dec 21.
10
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.

引用本文的文献

1
Live-cell omics with Raman spectroscopy.基于拉曼光谱的活细胞组学
Microscopy (Oxf). 2025 Jun 26;74(3):189-200. doi: 10.1093/jmicro/dfaf020.
2
Bias in artificial intelligence for medical imaging: fundamentals, detection, avoidance, mitigation, challenges, ethics, and prospects.医学成像人工智能中的偏差:基础、检测、避免、缓解、挑战、伦理及前景
Diagn Interv Radiol. 2025 Mar 3;31(2):75-88. doi: 10.4274/dir.2024.242854. Epub 2024 Jul 2.
3
Is this the Dawning of AI for Sarcoidosis?这是结节病人工智能时代的曙光吗?
Lung. 2023 Oct;201(5):443-444. doi: 10.1007/s00408-023-00643-5. Epub 2023 Sep 20.
4
Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles.噪声异构特征子采样岭回归集成的学习曲线
ArXiv. 2024 Jan 9:arXiv:2307.03176v3.
5
HAL-X: Scalable hierarchical clustering for rapid and tunable single-cell analysis.HAL-X:用于快速和可调单细胞分析的可扩展层次聚类。
PLoS Comput Biol. 2022 Oct 3;18(10):e1010349. doi: 10.1371/journal.pcbi.1010349. eCollection 2022 Oct.
6
Bias-variance decomposition of overparameterized regression with random linear features.具有随机线性特征的过参数化回归的偏差-方差分解
Phys Rev E. 2022 Aug;106(2-2):025304. doi: 10.1103/PhysRevE.106.025304.

本文引用的文献

1
SURPRISES IN HIGH-DIMENSIONAL RIDGELESS LEAST SQUARES INTERPOLATION.高维无脊最小二乘插值中的意外情况。
Ann Stat. 2022 Apr;50(2):949-986. doi: 10.1214/21-aos2133. Epub 2022 Apr 7.
2
High-dimensional dynamics of generalization error in neural networks.神经网络泛化误差的高维动力学。
Neural Netw. 2020 Dec;132:428-446. doi: 10.1016/j.neunet.2020.08.022. Epub 2020 Sep 5.
3
A brief prehistory of double descent.双系继嗣简史。
Proc Natl Acad Sci U S A. 2020 May 19;117(20):10625-10626. doi: 10.1073/pnas.2001875117. Epub 2020 May 5.
4
Benign overfitting in linear regression.线性回归中的良性过拟合。
Proc Natl Acad Sci U S A. 2020 Dec 1;117(48):30063-30070. doi: 10.1073/pnas.1907378117. Epub 2020 Apr 24.
5
Jamming transition as a paradigm to understand the loss landscape of deep neural networks.作为理解深度神经网络损失景观的范例的阻塞转变。
Phys Rev E. 2019 Jul;100(1-1):012115. doi: 10.1103/PhysRevE.100.012115.
6
A high-bias, low-variance introduction to Machine Learning for physicists.面向物理学家的机器学习高偏差、低方差入门介绍。
Phys Rep. 2019 May 30;810:1-124. doi: 10.1016/j.physrep.2019.03.001. Epub 2019 Mar 14.
7
Reconciling modern machine-learning practice and the classical bias-variance trade-off.调和现代机器学习实践与经典偏差-方差权衡。
Proc Natl Acad Sci U S A. 2019 Aug 6;116(32):15849-15854. doi: 10.1073/pnas.1903070116. Epub 2019 Jul 24.
8
Constrained optimization as ecological dynamics with applications to random quadratic programming in high dimensions.作为生态动力学的约束优化及其在高维随机二次规划中的应用
Phys Rev E. 2019 May;99(5-1):052111. doi: 10.1103/PhysRevE.99.052111.
9
Optimal errors and phase transitions in high-dimensional generalized linear models.高维广义线性模型中的最优误差与相变
Proc Natl Acad Sci U S A. 2019 Mar 19;116(12):5451-5460. doi: 10.1073/pnas.1802705116. Epub 2019 Mar 1.
10
Statistical physics of community ecology: a cavity solution to MacArthur's consumer resource model.群落生态学的统计物理学:麦克阿瑟消费者-资源模型的空腔解
J Stat Mech. 2018 Mar;2018. doi: 10.1088/1742-5468/aab04e. Epub 2018 Mar 20.