• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

关于分布式随机梯度下降瞬态时间的精确估计

A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent.

作者信息

Pu Shi, Olshevsky Alex, Paschalidis Ioannis Ch

机构信息

School of Data Science, Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong, Shenzhen, China.

Department of Electrical and Computer Engineering and the Division of Systems Engineering, Boston University, Boston, MA.

出版信息

IEEE Trans Automat Contr. 2022 Nov;67(11):5900-5915. doi: 10.1109/tac.2021.3126253. Epub 2021 Nov 9.

DOI:10.1109/tac.2021.3126253
PMID:37284602
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10241409/
Abstract

This paper is concerned with minimizing the average of cost functions over a network in which agents may communicate and exchange information with each other. We consider the setting where only noisy gradient information is available. To solve the problem, we study the distributed stochastic gradient descent (DSGD) method and perform a non-asymptotic convergence analysis. For strongly convex and smooth objective functions, in expectation, DSGD asymptotically achieves the optimal network independent convergence rate compared to centralized stochastic gradient descent (SGD). Our main contribution is to characterize the transient time needed for DSGD to approach the asymptotic convergence rate. Moreover, we construct a "hard" optimization problem that proves the sharpness of the obtained result. Numerical experiments demonstrate the tightness of the theoretical results.

摘要

本文关注的是在一个网络中最小化成本函数的平均值,其中智能体可以相互通信和交换信息。我们考虑只有噪声梯度信息可用的情况。为了解决这个问题,我们研究了分布式随机梯度下降(DSGD)方法并进行了非渐近收敛分析。对于强凸且光滑的目标函数,从期望上来说,与集中式随机梯度下降(SGD)相比,DSGD渐近地实现了最优的与网络无关的收敛速率。我们的主要贡献是刻画了DSGD达到渐近收敛速率所需的瞬态时间。此外,我们构造了一个“困难”的优化问题,证明了所得结果的尖锐性。数值实验证明了理论结果的紧密性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/8c586caff401/nihms-1845323-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/1dbee39c946b/nihms-1845323-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/324640b1f036/nihms-1845323-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/30648b7428dc/nihms-1845323-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/6fb132dcea71/nihms-1845323-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/8c586caff401/nihms-1845323-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/1dbee39c946b/nihms-1845323-f0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/324640b1f036/nihms-1845323-f0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/30648b7428dc/nihms-1845323-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/6fb132dcea71/nihms-1845323-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/741d/10241409/8c586caff401/nihms-1845323-f0008.jpg

相似文献

1
A Sharp Estimate on the Transient Time of Distributed Stochastic Gradient Descent.关于分布式随机梯度下降瞬态时间的精确估计
IEEE Trans Automat Contr. 2022 Nov;67(11):5900-5915. doi: 10.1109/tac.2021.3126253. Epub 2021 Nov 9.
2
Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning.机器学习分布式随机优化中的渐近网络独立性
IEEE Signal Process Mag. 2020 May;37(3):114-122. doi: 10.1109/msp.2020.2975212. Epub 2020 May 6.
3
Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions.鲁棒异步随机梯度推送:强凸函数的渐近最优和与网络无关的性能
J Mach Learn Res. 2020;21.
4
Decentralized stochastic sharpness-aware minimization algorithm.去中心化随机锐化感知最小化算法。
Neural Netw. 2024 Aug;176:106325. doi: 10.1016/j.neunet.2024.106325. Epub 2024 Apr 17.
5
Distributed Stochastic Constrained Composite Optimization Over Time-Varying Network With a Class of Communication Noise.具有一类通信噪声的时变网络上的分布式随机约束复合优化
IEEE Trans Cybern. 2023 Jun;53(6):3561-3573. doi: 10.1109/TCYB.2021.3127278. Epub 2023 May 17.
6
Communication-Censored Distributed Stochastic Gradient Descent.通信受限分布式随机梯度下降
IEEE Trans Neural Netw Learn Syst. 2022 Nov;33(11):6831-6843. doi: 10.1109/TNNLS.2021.3083655. Epub 2022 Oct 27.
7
Dualityfree Methods for Stochastic Composition Optimization.随机组合优化的无对偶方法
IEEE Trans Neural Netw Learn Syst. 2019 Apr;30(4):1205-1217. doi: 10.1109/TNNLS.2018.2866699. Epub 2018 Sep 12.
8
Preconditioned Stochastic Gradient Descent.预处理随机梯度下降。
IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1454-1466. doi: 10.1109/TNNLS.2017.2672978. Epub 2017 Mar 9.
9
The Strength of Nesterov's Extrapolation in the Individual Convergence of Nonsmooth Optimization.涅斯捷罗夫外推法在非光滑优化个体收敛中的强度
IEEE Trans Neural Netw Learn Syst. 2020 Jul;31(7):2557-2568. doi: 10.1109/TNNLS.2019.2933452. Epub 2019 Sep 2.
10
Stochastic Gradient Descent for Nonconvex Learning Without Bounded Gradient Assumptions.无梯度有界假设下非凸学习的随机梯度下降法
IEEE Trans Neural Netw Learn Syst. 2020 Oct;31(10):4394-4400. doi: 10.1109/TNNLS.2019.2952219. Epub 2019 Dec 11.

本文引用的文献

1
Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning.机器学习分布式随机优化中的渐近网络独立性
IEEE Signal Process Mag. 2020 May;37(3):114-122. doi: 10.1109/msp.2020.2975212. Epub 2020 May 6.
2
Robust Asynchronous Stochastic Gradient-Push: Asymptotically Optimal and Network-Independent Performance for Strongly Convex Functions.鲁棒异步随机梯度推送:强凸函数的渐近最优和与网络无关的性能
J Mach Learn Res. 2020;21.
3
Federated learning of predictive models from federated Electronic Health Records.
从联邦电子健康记录中联合学习预测模型。
Int J Med Inform. 2018 Apr;112:59-67. doi: 10.1016/j.ijmedinf.2018.01.007. Epub 2018 Jan 12.