Suppr超能文献

加速分布式近似牛顿法

Accelerated Distributed Approximate Newton Method.

作者信息

Ye Haishan, He Chaoyang, Chang Xiangyu

出版信息

IEEE Trans Neural Netw Learn Syst. 2022 Mar 7;PP. doi: 10.1109/TNNLS.2022.3151736.

Abstract

Distributed second-order optimization, as an effective strategy for training large-scale machine learning systems, has been widely investigated due to its low communication complexity. However, the existing distributed second-order optimization algorithms, including distributed approximate Newton (DANE), accelerated inexact DANE (AIDE), and statistically preconditioned accelerated gradient (SPAG), are all required to precisely solve an expensive subproblem up to the target precision. Therefore, this causes these algorithms to suffer from high computation costs and this hinders their development. In this article, we design a novel distributed second-order algorithm called the accelerated distributed approximate Newton (ADAN) method to overcome the high computation costs of the existing ones. Compared with DANE, AIDE, and SPAG, which are constructed based on the relative smooth theory, ADAN's theoretical foundation is built upon the inexact Newton theory. The different theoretical foundations lead to handle the expensive subproblem efficiently, and steps required to solve the subproblem are independent of the target precision. At the same time, ADAN resorts to the acceleration and can effectively exploit the objective function's curvature information, making ADAN to achieve a low communication complexity. Thus, ADAN can achieve both the communication and computation efficiencies, while DANE, AIDE, and SPAG can achieve only the communication efficiency. Our empirical study also validates the advantages of ADAN over extant distributed second-order algorithms.

摘要

分布式二阶优化作为训练大规模机器学习系统的一种有效策略,因其低通信复杂度而受到广泛研究。然而,现有的分布式二阶优化算法,包括分布式近似牛顿法(DANE)、加速不精确DANE(AIDE)和统计预处理加速梯度法(SPAG),都需要精确求解一个代价高昂的子问题,直至达到目标精度。因此,这导致这些算法面临高昂的计算成本,阻碍了它们的发展。在本文中,我们设计了一种名为加速分布式近似牛顿法(ADAN)的新型分布式二阶算法,以克服现有算法的高计算成本问题。与基于相对光滑理论构建的DANE、AIDE和SPAG不同,ADAN的理论基础建立在不精确牛顿理论之上。不同的理论基础使得能够有效地处理代价高昂的子问题,并且求解子问题所需的步骤与目标精度无关。同时,ADAN采用了加速策略,能够有效地利用目标函数的曲率信息,使得ADAN实现了低通信复杂度。因此,ADAN能够同时实现通信和计算效率,而DANE、AIDE和SPAG只能实现通信效率。我们的实证研究也验证了ADAN相对于现有分布式二阶算法的优势。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验