Suppr超能文献

更快的随机拟牛顿法

Faster Stochastic Quasi-Newton Methods.

作者信息

Zhang Qingsong, Huang Feihu, Deng Cheng, Huang Heng

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4388-4397. doi: 10.1109/TNNLS.2021.3056947. Epub 2022 Aug 31.

Abstract

Stochastic optimization methods have become a class of popular optimization tools in machine learning. Especially, stochastic gradient descent (SGD) has been widely used for machine learning problems, such as training neural networks, due to low per-iteration computational complexity. In fact, the Newton or quasi-newton (QN) methods leveraging the second-order information are able to achieve a better solution than the first-order methods. Thus, stochastic QN (SQN) methods have been developed to achieve a better solution efficiently than the stochastic first-order methods by utilizing approximate second-order information. However, the existing SQN methods still do not reach the best known stochastic first-order oracle (SFO) complexity. To fill this gap, we propose a novel faster stochastic QN method (SpiderSQN) based on the variance reduced technique of SIPDER. We prove that our SpiderSQN method reaches the best known SFO complexity of O(n+nϵ) in the finite-sum setting to obtain an ϵ -first-order stationary point. To further improve its practical performance, we incorporate SpiderSQN with different momentum schemes. Moreover, the proposed algorithms are generalized to the online setting, and the corresponding SFO complexity of O(ϵ) is developed, which also matches the existing best result. Extensive experiments on benchmark data sets demonstrate that our new algorithms outperform state-of-the-art approaches for nonconvex optimization.

摘要

随机优化方法已成为机器学习中一类流行的优化工具。特别是,随机梯度下降(SGD)由于每次迭代的计算复杂度较低,已被广泛应用于机器学习问题,如训练神经网络。事实上,利用二阶信息的牛顿法或拟牛顿法(QN)能够比一阶方法获得更好的解。因此,随机拟牛顿法(SQN)已被开发出来,通过利用近似二阶信息,比随机一阶方法更有效地获得更好的解。然而,现有的SQN方法仍未达到最知名的随机一阶神谕(SFO)复杂度。为了填补这一空白,我们基于SIPDER的方差缩减技术提出了一种新颖的更快随机拟牛顿法(SpiderSQN)。我们证明,在有限和设置下,我们的SpiderSQN方法达到了O(n + nϵ)的最知名SFO复杂度,以获得一个ϵ-一阶驻点。为了进一步提高其实际性能,我们将SpiderSQN与不同的动量方案相结合。此外,所提出的算法被推广到在线设置,并开发了相应的O(ϵ)的SFO复杂度,这也与现有的最佳结果相匹配。在基准数据集上的大量实验表明,我们的新算法在非凸优化方面优于现有方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验