• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization.mL-BFGS:一种用于分布式大规模神经网络优化的基于动量的L-BFGS算法。
Transact Mach Learn Res. 2023 Aug;2023.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Stabilizing machine learning for reproducible and explainable results: A novel validation approach to subject-specific insights.稳定机器学习以获得可重复和可解释的结果:一种针对特定个体见解的新型验证方法。
Comput Methods Programs Biomed. 2025 Jun 21;269:108899. doi: 10.1016/j.cmpb.2025.108899.
4
Asynchronous Parallel Stochastic Quasi-Newton Methods.异步并行随机拟牛顿法
Parallel Comput. 2021 Apr;101. doi: 10.1016/j.parco.2020.102721. Epub 2020 Nov 4.
5
Uterotonic agents for preventing postpartum haemorrhage: a network meta-analysis.预防产后出血的宫缩剂:一项网状Meta分析
Cochrane Database Syst Rev. 2018 Apr 25;4(4):CD011689. doi: 10.1002/14651858.CD011689.pub2.
6
Local anaesthesia for pain control in first trimester surgical abortion.在妊娠早期手术流产中进行局部麻醉以控制疼痛。
Cochrane Database Syst Rev. 2024 Feb 13;2(2):CD006712. doi: 10.1002/14651858.CD006712.pub3.
7
Uterotonic agents for preventing postpartum haemorrhage: a network meta-analysis.用于预防产后出血的宫缩剂:一项网状Meta分析
Cochrane Database Syst Rev. 2025 Apr 16;4(4):CD011689. doi: 10.1002/14651858.CD011689.pub4.
8
Oxytocin for preventing postpartum haemorrhage (PPH) in non-facility birth settings.在非医疗机构分娩环境中使用缩宫素预防产后出血
Cochrane Database Syst Rev. 2016 Apr 14;4(4):CD011491. doi: 10.1002/14651858.CD011491.pub2.
9
Predicting the risk of threatened abortion using machine learning methods: a comparative study.使用机器学习方法预测先兆流产风险:一项比较研究。
BMC Pregnancy Childbirth. 2025 Aug 30;25(1):901. doi: 10.1186/s12884-025-08030-z.
10
Tranexamic acid for preventing postpartum haemorrhage after caesarean section.氨甲环酸预防剖宫产产后出血。
Cochrane Database Syst Rev. 2024 Nov 13;11(11):CD016278. doi: 10.1002/14651858.CD016278.

本文引用的文献

1
New Results on Superlinear Convergence of Classical Quasi-Newton Methods.经典拟牛顿法超线性收敛的新结果
J Optim Theory Appl. 2021;188(3):744-769. doi: 10.1007/s10957-020-01805-8. Epub 2021 Jan 9.
2
An Accelerated Linearly Convergent Stochastic L-BFGS Algorithm.一种加速线性收敛的随机L-BFGS算法。
IEEE Trans Neural Netw Learn Syst. 2019 Nov;30(11):3338-3346. doi: 10.1109/TNNLS.2019.2891088. Epub 2019 Jan 25.

mL-BFGS:一种用于分布式大规模神经网络优化的基于动量的L-BFGS算法。

mL-BFGS: A Momentum-based L-BFGS for Distributed Large-Scale Neural Network Optimization.

作者信息

Niu Yue, Fabian Zalan, Lee Sunwoo, Soltanolkotabi Mahdi, Avestimehr Salman

机构信息

Department of Electrical and Computer Engineering, University of Southern California.

Department of Computer Science and Engineering, Inha University.

出版信息

Transact Mach Learn Res. 2023 Aug;2023.

PMID:40896294
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12393816/
Abstract

Quasi-Newton methods still face significant challenges in training large-scale neural networks due to additional compute costs in the Hessian related computations and instability issues in stochastic training. A well-known method, L-BFGS that efficiently approximates the Hessian using history parameter and gradient changes, suffers convergence instability in stochastic training. So far, attempts that adapt L-BFGS to large-scale stochastic training incur considerable extra overhead, which offsets its convergence benefits in wall-clock time. In this paper, we propose mL-BFGS, a lightweight momentum-based L-BFGS algorithm that paves the way for quasi-Newton (QN) methods in large-scale distributed deep neural network (DNN) optimization. mL-BFGS introduces a nearly cost-free momentum scheme into L-BFGS update and greatly reduces stochastic noise in the Hessian, therefore stabilizing convergence during stochastic optimization. For model training at a large scale, mL-BFGS approximates a block-wise Hessian, thus enabling distributing compute and memory costs across all computing nodes. We provide a supporting convergence analysis for mL-BFGS in stochastic settings. To investigate mL-BFGS's potential in large-scale DNN training, we train benchmark neural models using mL-BFGS and compare performance with baselines (SGD, Adam, and other quasi-Newton methods). Results show that mL-BFGS achieves both noticeable iteration-wise and wall-clock speedup.

摘要

由于在与海森矩阵相关的计算中存在额外的计算成本以及随机训练中的不稳定性问题,拟牛顿法在训练大规模神经网络时仍面临重大挑战。一种著名的方法L-BFGS,它利用历史参数和梯度变化有效地近似海森矩阵,但在随机训练中存在收敛不稳定性。到目前为止,将L-BFGS应用于大规模随机训练的尝试会产生相当大的额外开销,这抵消了其在实际运行时间上的收敛优势。在本文中,我们提出了mL-BFGS,一种基于动量的轻量级L-BFGS算法,为大规模分布式深度神经网络(DNN)优化中的拟牛顿(QN)方法铺平了道路。mL-BFGS在L-BFGS更新中引入了一种几乎无成本的动量方案,大大降低了海森矩阵中的随机噪声,从而在随机优化过程中稳定了收敛。对于大规模的模型训练,mL-BFGS近似一个分块海森矩阵,从而能够在所有计算节点之间分配计算和内存成本。我们为mL-BFGS在随机设置下提供了支持性的收敛性分析。为了研究mL-BFGS在大规模DNN训练中的潜力,我们使用mL-BFGS训练基准神经模型,并与基线方法(SGD、Adam和其他拟牛顿方法)比较性能。结果表明,mL-BFGS在迭代次数和实际运行时间上都实现了显著加速。