深度神经网络的分布式牛顿方法。

Distributed Newton Methods for Deep Neural Networks.

作者信息

Wang Chien-Chih, Tan Kent Loong, Chen Chun-Ting, Lin Yu-Hsiang, Keerthi S Sathiya, Mahajan Dhruv, Sundararajan S, Lin Chih-Jen

机构信息

Department of Computer Science, National Taiwan University, Taipei 10617, Taiwan

Department of Physics, National Taiwan University, Taipei 10617, Taiwan

出版信息

Neural Comput. 2018 Jun;30(6):1673-1724. doi: 10.1162/neco_a_01088. Epub 2018 Apr 13.

DOI:10.1162/neco_a_01088

PMID:29652589

Abstract

Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method for training deep neural networks. By variable and feature-wise data partitions and some careful designs, we are able to explicitly use the Jacobian matrix for matrix-vector products in the Newton method. Some techniques are incorporated to reduce the running time as well as memory consumption. First, to reduce the communication cost, we propose a diagonalization method such that an approximate Newton direction can be obtained without communication between machines. Second, we consider subsampled Gauss-Newton matrices for reducing the running time as well as the communication cost. Third, to reduce the synchronization cost, we terminate the process of finding an approximate Newton direction even though some nodes have not finished their tasks. Details of some implementation issues in distributed environments are thoroughly investigated. Experiments demonstrate that the proposed method is effective for the distributed training of deep neural networks. Compared with stochastic gradient methods, it is more robust and may give better test accuracy.

摘要

深度学习涉及一个困难的非凸优化问题，在深度结构的任意两个相邻层之间存在大量权重。为了处理大型数据集或复杂网络，需要进行分布式训练，但函数、梯度和海森矩阵的计算成本很高。特别是，通信和同步成本可能成为瓶颈。在这封信中，我们关注模型分布式存储的情况，并提出一种用于训练深度神经网络的新型分布式牛顿法。通过变量和特征维度的数据分区以及一些精心设计，我们能够在牛顿法中明确使用雅可比矩阵进行矩阵向量乘积运算。还采用了一些技术来减少运行时间和内存消耗。首先，为了降低通信成本，我们提出一种对角化方法，使得无需机器间通信就能获得近似牛顿方向。其次，我们考虑使用子采样高斯 - 牛顿矩阵来减少运行时间和通信成本。第三，为了降低同步成本，即使某些节点尚未完成任务，我们也会终止寻找近似牛顿方向的过程。我们对分布式环境中一些实现问题的细节进行了深入研究。实验表明，所提出的方法对于深度神经网络的分布式训练是有效的。与随机梯度方法相比，它更稳健，并且可能给出更好的测试精度。

相似文献

Distributed Newton Methods for Deep Neural Networks.深度神经网络的分布式牛顿方法。

Neural Comput. 2018 Jun;30(6):1673-1724. doi: 10.1162/neco_a_01088. Epub 2018 Apr 13.

Subsampled Hessian Newton Methods for Supervised Learning.用于监督学习的子采样海森牛顿法

Neural Comput. 2015 Aug;27(8):1766-95. doi: 10.1162/NECO_a_00751. Epub 2015 Jun 16.

Efficient calculation of the Gauss-Newton approximation of the Hessian matrix in neural networks.高效计算神经网络中 Hessian 矩阵的 Gauss-Newton 逼近。

Neural Comput. 2012 Mar;24(3):607-10. doi: 10.1162/NECO_a_00248. Epub 2011 Dec 14.

Communication-efficient distributed cubic Newton with compressed lazy Hessian.带压缩惰性海森的通信高效分布式三次牛顿法

Neural Netw. 2024 Jun;174:106212. doi: 10.1016/j.neunet.2024.106212. Epub 2024 Feb 27.

Preconditioned Stochastic Gradient Descent.预处理随机梯度下降。

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1454-1466. doi: 10.1109/TNNLS.2017.2672978. Epub 2017 Mar 9.

Faster Stochastic Quasi-Newton Methods.更快的随机拟牛顿法

IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4388-4397. doi: 10.1109/TNNLS.2021.3056947. Epub 2022 Aug 31.

Two-Phase Switching Optimization Strategy in Deep Neural Networks.深度神经网络中的两阶段切换优化策略。

IEEE Trans Neural Netw Learn Syst. 2022 Jan;33(1):330-339. doi: 10.1109/TNNLS.2020.3027750. Epub 2022 Jan 5.

Accelerated Distributed Approximate Newton Method.加速分布式近似牛顿法

IEEE Trans Neural Netw Learn Syst. 2022 Mar 7;PP. doi: 10.1109/TNNLS.2022.3151736.

Recursion Newton-Like Algorithm for l-ReLU Deep Neural Networks.

IEEE Trans Neural Netw Learn Syst. 2023 Sep;34(9):5882-5896. doi: 10.1109/TNNLS.2021.3131406. Epub 2023 Sep 1.

Automated Transition State Searches without Evaluating the Hessian.无需评估海森矩阵的自动过渡态搜索

J Chem Theory Comput. 2012 Dec 11;8(12):5166-74. doi: 10.1021/ct300659d. Epub 2012 Oct 22.

引用本文的文献

Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings.后处理通过二阶深度学习和嵌入增强蛋白质二级结构预测。

Comput Struct Biotechnol J. 2025 Jan 2;27:243-251. doi: 10.1016/j.csbj.2024.12.022. eCollection 2025.

Interpretability With Accurate Small Models.精确小模型的可解释性。

Front Artif Intell. 2020 Feb 25;3:3. doi: 10.3389/frai.2020.00003. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

深度神经网络的分布式牛顿方法。

Distributed Newton Methods for Deep Neural Networks.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献