Pu Shi, Olshevsky Alex, Paschalidis Ioannis Ch
Institute for Data and Decision Analytics, The Chinese University of Hong Kong, Shenzhen, China and Shenzhen Research Institute of Big Data. The research was conducted when the author was with Division of Systems Engineering, Boston University, Boston, MA.
Department of Electrical and Computer Engineering and Division of Systems Engineering, Boston University, Boston, MA.
IEEE Signal Process Mag. 2020 May;37(3):114-122. doi: 10.1109/msp.2020.2975212. Epub 2020 May 6.
We provide a discussion of several recent results which, in certain scenarios, are able to overcome a barrier in distributed stochastic optimization for machine learning. Our focus is the so-called asymptotic network independence property, which is achieved whenever a distributed method executed over a network of nodes asymptotically converges to the optimal solution at a comparable rate to a centralized method with the same computational power as the entire network. We explain this property through an example involving the training of ML models and sketch a short mathematical analysis for comparing the performance of distributed stochastic gradient descent (DSGD) with centralized stochastic gradient decent (SGD).
我们讨论了几个近期的结果,在某些情况下,这些结果能够克服机器学习中分布式随机优化的一个障碍。我们关注的是所谓的渐近网络独立性属性,当在节点网络上执行的分布式方法以与具有与整个网络相同计算能力的集中式方法相当的速率渐近收敛到最优解时,就会实现这一属性。我们通过一个涉及机器学习模型训练的例子来解释这一属性,并简要勾勒一个数学分析,以比较分布式随机梯度下降(DSGD)和集中式随机梯度下降(SGD)的性能。