带压缩惰性海森的通信高效分布式三次牛顿法

Communication-efficient distributed cubic Newton with compressed lazy Hessian.

机构信息

School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, PR China.

School of Mathematics, Southeast University, Nanjing, Jiangsu, PR China.

出版信息

Neural Netw. 2024 Jun;174:106212. doi: 10.1016/j.neunet.2024.106212. Epub 2024 Feb 27.

DOI:10.1016/j.neunet.2024.106212

PMID:38479185

Abstract

Recently, second-order distributed optimization algorithms have been becoming a research hot in distributed learning, due to their faster convergence rate than the first-order algorithms. However, second-order algorithms always suffer from serious communication bottleneck. To conquer such challenge, we propose communication-efficient second-order distributed optimization algorithms in the parameter-server framework, by incorporating cubic Newton methods with compressed lazy Hessian. Specifically, our algorithms require each worker communicate compressed Hessians with the server only at some particular iterations, which can save both communication bits and communication rounds. For non-convex problems, we theoretically prove that our algorithms can reduce the communication cost comparing to the state-of-the-art second-order algorithms, while maintaining the same iteration complexity order O(ϵ) as the centralized cubic Newton methods. By further using gradient regularization technique, our algorithms can achieve global convergence for convex problems. Moreover, for strongly convex problems, our algorithms can achieve local superlinear convergence rate without any requirement on initial conditions. Finally, numerical experiments are conducted to show the high efficiency of the proposed algorithms.

摘要

最近，二阶分布式优化算法由于其比一阶算法更快的收敛速度，成为分布式学习的研究热点。然而，二阶算法总是受到严重的通信瓶颈的困扰。为了克服这一挑战，我们在参数服务器框架中提出了通信高效的二阶分布式优化算法，通过将三次牛顿方法与压缩惰性海森矩阵相结合。具体来说，我们的算法要求每个工作器仅在某些特定的迭代中与服务器通信压缩的海森矩阵，这可以节省通信位和通信轮数。对于非凸问题，我们从理论上证明，与最先进的二阶算法相比，我们的算法可以降低通信成本，同时保持与集中式三次牛顿方法相同的迭代复杂度阶 O(ϵ)。通过进一步使用梯度正则化技术，我们的算法可以实现凸问题的全局收敛。此外，对于强凸问题，我们的算法可以在不要求任何初始条件的情况下实现局部超线性收敛速度。最后，通过数值实验验证了所提出算法的高效性。

相似文献

Communication-efficient distributed cubic Newton with compressed lazy Hessian.

Neural Netw. 2024 Jun;174:106212. doi: 10.1016/j.neunet.2024.106212. Epub 2024 Feb 27.

Gradient regularization of Newton method with Bregman distances.

Math Program. 2024;204(1-2):1-25. doi: 10.1007/s10107-023-01943-7. Epub 2023 Mar 24.

Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning.

IEEE Trans Pattern Anal Mach Intell. 2022 Apr;44(4):2031-2044. doi: 10.1109/TPAMI.2020.3033286. Epub 2022 Mar 4.

Minimizing Uniformly Convex Functions by Cubic Regularization of Newton Method.

J Optim Theory Appl. 2021;189(1):317-339. doi: 10.1007/s10957-021-01838-7. Epub 2021 Mar 10.

Decentralized ADMM with compressed and event-triggered communication.

Neural Netw. 2023 Aug;165:472-482. doi: 10.1016/j.neunet.2023.06.001. Epub 2023 Jun 9.

Subsampled Hessian Newton Methods for Supervised Learning.

Neural Comput. 2015 Aug;27(8):1766-95. doi: 10.1162/NECO_a_00751. Epub 2015 Jun 16.

Communication-efficient federated learning with stagewise training strategy.

Neural Netw. 2023 Oct;167:460-472. doi: 10.1016/j.neunet.2023.08.033. Epub 2023 Sep 1.

AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization.

Comput Intell Neurosci. 2021 Nov 10;2021:5790608. doi: 10.1155/2021/5790608. eCollection 2021.

Modified Newton-Raphson GRAPE methods for optimal control of spin systems.

J Chem Phys. 2016 May 28;144(20):204107. doi: 10.1063/1.4949534.

Stochastic Optimization for Nonconvex Problem With Inexact Hessian Matrix, Gradient, and Function.

IEEE Trans Neural Netw Learn Syst. 2025 Jan;36(1):1651-1663. doi: 10.1109/TNNLS.2023.3326177. Epub 2025 Jan 7.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

带压缩惰性海森的通信高效分布式三次牛顿法

Communication-efficient distributed cubic Newton with compressed lazy Hessian.

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献