• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

可扩展且实用的大规模深度学习自然梯度。

Scalable and Practical Natural Gradient for Large-Scale Deep Learning.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):404-415. doi: 10.1109/TPAMI.2020.3004354. Epub 2021 Dec 7.

DOI:10.1109/TPAMI.2020.3004354
PMID:32750792
Abstract

Large-scale distributed training of deep neural networks results in models with worse generalization performance as a result of the increase in the effective mini-batch size. Previous approaches attempt to address this problem by varying the learning rate and batch size over epochs and layers, or ad hoc modifications of batch normalization. We propose scalable and practical natural gradient descent (SP-NGD), a principled approach for training models that allows them to attain similar generalization performance to models trained with first-order optimization methods, but with accelerated convergence. Furthermore, SP-NGD scales to large mini-batch sizes with a negligible computational overhead as compared to first-order methods. We evaluated SP-NGD on a benchmark task where highly optimized first-order methods are available as references: training a ResNet-50 model for image classification on ImageNet. We demonstrate convergence to a top-1 validation accuracy of 75.4 percent in 5.5 minutes using a mini-batch size of 32,768 with 1,024 GPUs, as well as an accuracy of 74.9 percent with an extremely large mini-batch size of 131,072 in 873 steps of SP-NGD.

摘要

大规模分布式训练深度神经网络会导致有效 mini-batch 大小增加,从而导致模型的泛化性能下降。以前的方法试图通过在 epoch 和层之间改变学习率和 batch size,或者对批归一化进行特定的修改来解决这个问题。我们提出了可扩展且实用的自然梯度下降(SP-NGD),这是一种训练模型的原则性方法,它可以使模型达到与使用一阶优化方法训练的模型相似的泛化性能,但收敛速度更快。此外,与一阶方法相比,SP-NGD 在大规模 mini-batch 大小下的计算开销可以忽略不计。我们在一个基准任务上评估了 SP-NGD,该任务提供了高度优化的一阶方法作为参考:在 ImageNet 上对 ResNet-50 模型进行图像分类训练。我们在使用 1,024 个 GPU 时,使用 32,768 的 mini-batch 大小,在 5.5 分钟内收敛到 75.4%的验证准确率,在 873 步的 SP-NGD 中,使用 131,072 的超大 mini-batch 大小,达到 74.9%的准确率。

相似文献

1
Scalable and Practical Natural Gradient for Large-Scale Deep Learning.可扩展且实用的大规模深度学习自然梯度。
IEEE Trans Pattern Anal Mach Intell. 2022 Jan;44(1):404-415. doi: 10.1109/TPAMI.2020.3004354. Epub 2021 Dec 7.
2
Towards accelerating model parallelism in distributed deep learning systems.面向分布式深度学习系统中模型并行性的加速。
PLoS One. 2023 Nov 2;18(11):e0293338. doi: 10.1371/journal.pone.0293338. eCollection 2023.
3
Application of cluster repeated mini-batch training method to classify electroencephalography for grab and lift tasks.应用聚类重复小批量训练方法对抓举任务的脑电图进行分类。
Med Eng Phys. 2023 Oct;120:104041. doi: 10.1016/j.medengphy.2023.104041. Epub 2023 Aug 23.
4
Achieving small-batch accuracy with large-batch scalability via Hessian-aware learning rate adjustment.通过黑塞矩阵感知学习率调整实现具有大批量可扩展性的小批量精度。
Neural Netw. 2023 Jan;158:1-14. doi: 10.1016/j.neunet.2022.11.007. Epub 2022 Nov 11.
5
ADAMT: Adaptive distributed multi-task learning for efficient image recognition in Mobile Ad-hoc Networks.ADAMT:用于移动自组织网络中高效图像识别的自适应分布式多任务学习
Neural Netw. 2025 Jul;187:107316. doi: 10.1016/j.neunet.2025.107316. Epub 2025 Mar 6.
6
Lightweight multi-scale classification of chest radiographs via size-specific batch normalization.基于特定尺寸批归一化的轻量级多尺度胸片分类。
Comput Methods Programs Biomed. 2023 Jun;236:107558. doi: 10.1016/j.cmpb.2023.107558. Epub 2023 Apr 18.
7
HDL-ACO hybrid deep learning and ant colony optimization for ocular optical coherence tomography image classification.用于眼部光学相干断层扫描图像分类的HDL-ACO混合深度学习与蚁群优化
Sci Rep. 2025 Feb 18;15(1):5888. doi: 10.1038/s41598-025-89961-7.
8
Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像(MRI)中进行脑肿瘤分割与检测
Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.
9
Deep learning for computational structural optimization.深度学习在计算结构优化中的应用。
ISA Trans. 2020 Aug;103:177-191. doi: 10.1016/j.isatra.2020.03.033. Epub 2020 Apr 10.
10
Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用:以新生儿呼吸暂停预测为例的研究
Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

引用本文的文献

1
Development and validation of an abnormality-derived deep-learning diagnostic system for major respiratory diseases.一种基于异常情况的主要呼吸系统疾病深度学习诊断系统的开发与验证
NPJ Digit Med. 2022 Aug 23;5(1):124. doi: 10.1038/s41746-022-00648-z.