Suppr超能文献

在离散优化和推理问题中,类似随机梯度下降的松弛方法等同于 metropolis 动力学。

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problems.

作者信息

Angelini Maria Chiara, Cavaliere Angelo Giorgio, Marino Raffaele, Ricci-Tersenghi Federico

机构信息

Dipartimento di Fisica, Sapienza Università di Roma, P.le Aldo Moro 5, 00185, Rome, Italy.

Istituto Nazionale di Fisica Nucleare, Sezione di Roma I, P.le A. Moro 5, 00185, Rome, Italy.

出版信息

Sci Rep. 2024 May 21;14(1):11638. doi: 10.1038/s41598-024-62625-8.

Abstract

Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g. SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.

摘要

随机梯度下降(SGD)与 metropolis 蒙特卡罗动力学有本质区别吗?这是理解机器学习领域最常用训练算法时的一个基本问题,但到目前为止尚未得到答案。在这里我们表明,在离散优化和推理问题中,一种类似 SGD 的算法的动力学与 metropolis 蒙特卡罗在适当选择的温度下非常相似,该温度取决于小批量大小。尽管这两种算法存在根本差异(例如 SGD 不满足细致平衡),但这种定量匹配在平衡态和非平衡态下都成立。这种等价性使我们能够利用关于蒙特卡罗算法性能和局限性的结果来优化类似 SGD 算法中的小批量大小,并使其在硬推理问题中有效恢复信号。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5822/11639713/a8505a5b66b3/41598_2024_62625_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验