Angelini Maria Chiara, Cavaliere Angelo Giorgio, Marino Raffaele, Ricci-Tersenghi Federico
Dipartimento di Fisica, Sapienza Università di Roma, P.le Aldo Moro 5, 00185, Rome, Italy.
Istituto Nazionale di Fisica Nucleare, Sezione di Roma I, P.le A. Moro 5, 00185, Rome, Italy.
Sci Rep. 2024 May 21;14(1):11638. doi: 10.1038/s41598-024-62625-8.
Is Stochastic Gradient Descent (SGD) substantially different from Metropolis Monte Carlo dynamics? This is a fundamental question at the time of understanding the most used training algorithm in the field of Machine Learning, but it received no answer until now. Here we show that in discrete optimization and inference problems, the dynamics of an SGD-like algorithm resemble very closely that of Metropolis Monte Carlo with a properly chosen temperature, which depends on the mini-batch size. This quantitative matching holds both at equilibrium and in the out-of-equilibrium regime, despite the two algorithms having fundamental differences (e.g. SGD does not satisfy detailed balance). Such equivalence allows us to use results about performances and limits of Monte Carlo algorithms to optimize the mini-batch size in the SGD-like algorithm and make it efficient at recovering the signal in hard inference problems.
随机梯度下降(SGD)与 metropolis 蒙特卡罗动力学有本质区别吗?这是理解机器学习领域最常用训练算法时的一个基本问题,但到目前为止尚未得到答案。在这里我们表明,在离散优化和推理问题中,一种类似 SGD 的算法的动力学与 metropolis 蒙特卡罗在适当选择的温度下非常相似,该温度取决于小批量大小。尽管这两种算法存在根本差异(例如 SGD 不满足细致平衡),但这种定量匹配在平衡态和非平衡态下都成立。这种等价性使我们能够利用关于蒙特卡罗算法性能和局限性的结果来优化类似 SGD 算法中的小批量大小,并使其在硬推理问题中有效恢复信号。