Scardapane Simone, Di Lorenzo Paolo
Department of Information Engineering, Electronics and Telecommunications, "Sapienza" University of Rome, Via Eudossiana 18, 00184 Rome, Italy.
Department of Engineering, University of Perugia, Via G. Duranti 93, 06125, Perugia, Italy.
Neural Netw. 2017 Jul;91:42-54. doi: 10.1016/j.neunet.2017.04.004. Epub 2017 Apr 19.
The aim of this paper is to develop a general framework for training neural networks (NNs) in a distributed environment, where training data is partitioned over a set of agents that communicate with each other through a sparse, possibly time-varying, connectivity pattern. In such distributed scenario, the training problem can be formulated as the (regularized) optimization of a non-convex social cost function, given by the sum of local (non-convex) costs, where each agent contributes with a single error term defined with respect to its local dataset. To devise a flexible and efficient solution, we customize a recently proposed framework for non-convex optimization over networks, which hinges on a (primal) convexification-decomposition technique to handle non-convexity, and a dynamic consensus procedure to diffuse information among the agents. Several typical choices for the training criterion (e.g., squared loss, cross entropy, etc.) and regularization (e.g., ℓ norm, sparsity inducing penalties, etc.) are included in the framework and explored along the paper. Convergence to a stationary solution of the social non-convex problem is guaranteed under mild assumptions. Additionally, we show a principled way allowing each agent to exploit a possible multi-core architecture (e.g., a local cloud) in order to parallelize its local optimization step, resulting in strategies that are both distributed (across the agents) and parallel (inside each agent) in nature. A comprehensive set of experimental results validate the proposed approach.
本文的目的是开发一个在分布式环境中训练神经网络(NN)的通用框架,其中训练数据分布在一组通过稀疏的、可能随时间变化的连接模式相互通信的智能体上。在这种分布式场景中,训练问题可以被表述为一个非凸社会成本函数的(正则化)优化问题,该函数由局部(非凸)成本之和给出,其中每个智能体贡献一个相对于其局部数据集定义的单个误差项。为了设计一个灵活高效的解决方案,我们定制了一个最近提出的用于网络非凸优化的框架,该框架依赖于一种(原始)凸化分解技术来处理非凸性,以及一种动态共识过程来在智能体之间传播信息。训练准则(例如,平方损失、交叉熵等)和正则化(例如,ℓ范数、稀疏诱导惩罚等)的几种典型选择都包含在该框架中,并在本文中进行了探讨。在温和假设下,保证收敛到社会非凸问题的一个平稳解。此外,我们展示了一种有原则的方法,允许每个智能体利用可能的多核架构(例如,本地云)来并行化其局部优化步骤,从而产生本质上既是分布式(跨智能体)又是并行(在每个智能体内)的策略。一组全面的实验结果验证了所提出的方法。