School of Computer Science and Engineering, Southeast University, Nanjing, Jiangsu, PR China.
School of Mathematics, Southeast University, Nanjing, Jiangsu, PR China.
Neural Netw. 2024 Jun;174:106212. doi: 10.1016/j.neunet.2024.106212. Epub 2024 Feb 27.
Recently, second-order distributed optimization algorithms have been becoming a research hot in distributed learning, due to their faster convergence rate than the first-order algorithms. However, second-order algorithms always suffer from serious communication bottleneck. To conquer such challenge, we propose communication-efficient second-order distributed optimization algorithms in the parameter-server framework, by incorporating cubic Newton methods with compressed lazy Hessian. Specifically, our algorithms require each worker communicate compressed Hessians with the server only at some particular iterations, which can save both communication bits and communication rounds. For non-convex problems, we theoretically prove that our algorithms can reduce the communication cost comparing to the state-of-the-art second-order algorithms, while maintaining the same iteration complexity order O(ϵ) as the centralized cubic Newton methods. By further using gradient regularization technique, our algorithms can achieve global convergence for convex problems. Moreover, for strongly convex problems, our algorithms can achieve local superlinear convergence rate without any requirement on initial conditions. Finally, numerical experiments are conducted to show the high efficiency of the proposed algorithms.
最近,二阶分布式优化算法由于其比一阶算法更快的收敛速度,成为分布式学习的研究热点。然而,二阶算法总是受到严重的通信瓶颈的困扰。为了克服这一挑战,我们在参数服务器框架中提出了通信高效的二阶分布式优化算法,通过将三次牛顿方法与压缩惰性海森矩阵相结合。具体来说,我们的算法要求每个工作器仅在某些特定的迭代中与服务器通信压缩的海森矩阵,这可以节省通信位和通信轮数。对于非凸问题,我们从理论上证明,与最先进的二阶算法相比,我们的算法可以降低通信成本,同时保持与集中式三次牛顿方法相同的迭代复杂度阶 O(ϵ)。通过进一步使用梯度正则化技术,我们的算法可以实现凸问题的全局收敛。此外,对于强凸问题,我们的算法可以在不要求任何初始条件的情况下实现局部超线性收敛速度。最后,通过数值实验验证了所提出算法的高效性。