Spiridonoff Artin, Olshevsky Alex, Paschalidis Ioannis Ch
Division of Systems Engineering, Boston University, Boston, MA 02215, USA.
J Mach Learn Res. 2020;21.
We consider the standard model of distributed optimization of a sum of functions , where node in a network holds the function (). We allow for a harsh network model characterized by asynchronous updates, message delays, unpredictable message losses, and directed communication among nodes. In this setting, we analyze a modification of the Gradient-Push method for distributed optimization, assuming that (i) node is capable of generating gradients of its function () corrupted by zero-mean bounded-support additive noise at each step, (ii) () is strongly convex, and (iii) each () has Lipschitz gradients. We show that our proposed method asymptotically performs as well as the best bounds on centralized gradient descent that takes steps in the direction of the sum of the noisy gradients of all the functions (), …, () at each step.
我们考虑函数之和的分布式优化标准模型,其中网络中的节点 (i) 持有函数 (f_i())。我们允许一种严苛的网络模型,其特征为异步更新、消息延迟、不可预测的消息丢失以及节点间的定向通信。在此设定下,我们分析用于分布式优化的梯度推送方法的一种变体,假设:(i) 节点 (i) 能够在每一步生成其函数 (f_i()) 的梯度,该梯度被零均值有界支撑加性噪声所干扰;(ii) (f_i()) 是强凸函数;(iii) 每个 (f_i()) 具有Lipschitz梯度。我们表明,我们提出的方法在渐近意义上与集中式梯度下降的最佳界表现相同,集中式梯度下降在每一步朝着所有函数 (f_1()),…,(f_n()) 的噪声梯度之和的方向进行步长更新。