• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在离散优化和推理问题中,类似随机梯度下降的松弛方法等同于 metropolis 动力学。

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems.

作者信息

Angelini Maria Chiara, Cavaliere Angelo Giorgio, Marino Raffaele, Ricci-Tersenghi Federico

机构信息

Dipartimento di Fisica, Sapienza Università di Roma, P.le Aldo Moro 5, 00185, Rome, Italy.

Istituto Nazionale di Fisica Nucleare, Sezione di Roma I, P.le A. Moro 5, 00185, Rome, Italy.

出版信息

Sci Rep. 2024 May 21;14(1):11638. doi: 10.1038/s41598-024-62625-8.

DOI:10.1038/s41598-024-62625-8
PMID:38773255
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11639713/
Abstract

Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g. SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.

摘要

随机梯度下降(SGD)与 metropolis 蒙特卡罗动力学有本质区别吗?这是理解机器学习领域最常用训练算法时的一个基本问题,但到目前为止尚未得到答案。在这里我们表明,在离散优化和推理问题中,一种类似 SGD 的算法的动力学与 metropolis 蒙特卡罗在适当选择的温度下非常相似,该温度取决于小批量大小。尽管这两种算法存在根本差异(例如 SGD 不满足细致平衡),但这种定量匹配在平衡态和非平衡态下都成立。这种等价性使我们能够利用关于蒙特卡罗算法性能和局限性的结果来优化类似 SGD 算法中的小批量大小,并使其在硬推理问题中有效恢复信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/b8c3792ffd60/41598_2024_62625_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/a8505a5b66b3/41598_2024_62625_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/7fd3bd9cacce/41598_2024_62625_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/d225a0dff713/41598_2024_62625_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/3f3e4a0e9839/41598_2024_62625_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/b8c3792ffd60/41598_2024_62625_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/a8505a5b66b3/41598_2024_62625_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/7fd3bd9cacce/41598_2024_62625_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/d225a0dff713/41598_2024_62625_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/3f3e4a0e9839/41598_2024_62625_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/b8c3792ffd60/41598_2024_62625_Fig5_HTML.jpg

相似文献

1
Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems.在离散优化和推理问题中,类似随机梯度下降的松弛方法等同于 metropolis 动力学。
Sci Rep. 2024 May 21;14(1):11638. doi: 10.1038/s41598-024-62625-8.
2
Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent.使用随机梯度下降对离散观测的随机动力学模型进行参数推断。
BMC Syst Biol. 2010 Jul 21;4:99. doi: 10.1186/1752-0509-4-99.
3
Optimization and Learning With Randomly Compressed Gradient Updates.随机压缩梯度更新的优化与学习。
Neural Comput. 2023 Jun 12;35(7):1234-1287. doi: 10.1162/neco_a_01588.
4
Weighted SGD for ℓ Regression with Randomized Preconditioning.用于带随机预处理的ℓ回归的加权随机梯度下降法。
Proc Annu ACM SIAM Symp Discret Algorithms. 2016 Jan;2016:558-569. doi: 10.1137/1.9781611974331.ch41.
5
Accelerating Minibatch Stochastic Gradient Descent Using Typicality Sampling.使用典型性抽样加速小批量随机梯度下降。
IEEE Trans Neural Netw Learn Syst. 2020 Nov;31(11):4649-4659. doi: 10.1109/TNNLS.2019.2957003. Epub 2020 Oct 29.
6
Monte Carlo simulations of glass-forming liquids beyond Metropolis.超越 metropolis 算法的玻璃形成液体的蒙特卡罗模拟
J Chem Phys. 2024 Sep 21;161(11). doi: 10.1063/5.0225978.
7
The inverse variance-flatness relation in stochastic gradient descent is critical for finding flat minima.随机梯度下降中的逆方差-平坦度关系对于找到平坦最小值至关重要。
Proc Natl Acad Sci U S A. 2021 Mar 2;118(9). doi: 10.1073/pnas.2015617118.
8
Anomalous diffusion dynamics of learning in deep neural networks.深度学习网络中学习的异常扩散动力学。
Neural Netw. 2022 May;149:18-28. doi: 10.1016/j.neunet.2022.01.019. Epub 2022 Feb 3.
9
A(DP) SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy.异步去中心化并行随机梯度下降与差分隐私。
IEEE Trans Pattern Anal Mach Intell. 2022 Nov;44(11):8036-8047. doi: 10.1109/TPAMI.2021.3107796. Epub 2022 Oct 4.
10
A Monte Carlo Metropolis-Hastings algorithm for sampling from distributions with intractable normalizing constants.一种用于从具有难以处理的归一化常数的分布中进行抽样的蒙特卡罗 metropolis-hastings 算法。
Neural Comput. 2013 Aug;25(8):2199-234. doi: 10.1162/NECO_a_00466. Epub 2013 Apr 22.

本文引用的文献

1
Hard optimization problems have soft edges.硬优化问题有软边界。
Sci Rep. 2023 Mar 4;13(1):3671. doi: 10.1038/s41598-023-30391-8.
2
Learning through atypical phase transitions in overparameterized neural networks.通过过参数化神经网络中的非典型相变进行学习。
Phys Rev E. 2022 Jul;106(1-1):014116. doi: 10.1103/PhysRevE.106.014116.
3
Spectral pruning of fully connected layers.全连接层的谱修剪。
Sci Rep. 2022 Jul 1;12(1):11201. doi: 10.1038/s41598-022-14805-7.
4
Unveiling the Structure of Wide Flat Minima in Neural Networks.揭示神经网络中的宽平坦极小值结构。
Phys Rev Lett. 2021 Dec 31;127(27):278301. doi: 10.1103/PhysRevLett.127.278301.
5
Training of sparse and dense deep neural networks: Fewer parameters, same performance.稀疏和密集深度神经网络的训练:参数更少,性能相同。
Phys Rev E. 2021 Nov;104(5-1):054312. doi: 10.1103/PhysRevE.104.054312.
6
Correspondence between neuroevolution and gradient descent.神经进化与梯度下降的对应关系。
Nat Commun. 2021 Nov 2;12(1):6317. doi: 10.1038/s41467-021-26568-2.
7
Fluctuation-dissipation-type theorem in stochastic linear learning.
Phys Rev E. 2021 Sep;104(3-1):034126. doi: 10.1103/PhysRevE.104.034126.
8
Machine learning in spectral domain.光谱域中的机器学习。
Nat Commun. 2021 Feb 26;12(1):1330. doi: 10.1038/s41467-021-21481-0.
9
Monte Carlo algorithms are very effective in finding the largest independent set in sparse random graphs.蒙特卡罗算法在寻找稀疏随机图中的最大独立集方面非常有效。
Phys Rev E. 2019 Jul;100(1-1):013302. doi: 10.1103/PhysRevE.100.013302.
10
A high-bias, low-variance introduction to Machine Learning for physicists.面向物理学家的机器学习高偏差、低方差入门介绍。
Phys Rep. 2019 May 30;810:1-124. doi: 10.1016/j.physrep.2019.03.001. Epub 2019 Mar 14.