• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在天河三号原型系统上部署和扩展分布式并行深度神经网络。

Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system.

机构信息

Xi'an Jiaotong University, Xi'an, 710049, Shaanxi, China.

出版信息

Sci Rep. 2021 Oct 12;11(1):20244. doi: 10.1038/s41598-021-98794-z.

DOI:10.1038/s41598-021-98794-z
PMID:34642373
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8511035/
Abstract

Due to the increase in computing power, it is possible to improve the feature extraction and data fitting capabilities of DNN networks by increasing their depth and model complexity. However, the big data and complex models greatly increase the training overhead of DNN, so accelerating their training process becomes a key task. The Tianhe-3 peak speed is designed to target E-class, and the huge computing power provides a potential opportunity for DNN training. We implement and extend LeNet, AlexNet, VGG, and ResNet model training for a single MT-2000+ and FT-2000+ compute nodes, as well as extended multi-node clusters, and propose an improved gradient synchronization process for Dynamic Allreduce communication optimization strategy for the gradient synchronization process base on the ARM architecture features of the Tianhe-3 prototype, providing experimental data and theoretical basis for further enhancing and improving the performance of the Tianhe-3 prototype in large-scale distributed training of neural networks.

摘要

由于计算能力的提高,可以通过增加 DNN 网络的深度和模型复杂度来提高其特征提取和数据拟合能力。然而,大数据和复杂模型大大增加了 DNN 的训练开销,因此加速其训练过程成为关键任务。天河三号的峰值速度设计针对 E 级,巨大的计算能力为 DNN 训练提供了潜在机会。我们在单个 MT-2000+ 和 FT-2000+ 计算节点上实现和扩展了 LeNet、AlexNet、VGG 和 ResNet 模型训练,以及扩展的多节点集群,并提出了一种改进的梯度同步过程,用于基于天河三号原型的 ARM 架构特性的动态 Allreduce 通信优化策略,为进一步增强和提高天河三号原型在神经网络大规模分布式训练中的性能提供了实验数据和理论依据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/b4ba187e8918/41598_2021_98794_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/970d2e423e74/41598_2021_98794_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/ab7942a83ee8/41598_2021_98794_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/237a392472c0/41598_2021_98794_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/8aa79a0355fe/41598_2021_98794_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/8eaf90d92239/41598_2021_98794_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/9309411a8b03/41598_2021_98794_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/0c68d4137b78/41598_2021_98794_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/93a6b8185c8f/41598_2021_98794_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/b4ba187e8918/41598_2021_98794_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/970d2e423e74/41598_2021_98794_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/ab7942a83ee8/41598_2021_98794_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/237a392472c0/41598_2021_98794_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/8aa79a0355fe/41598_2021_98794_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/8eaf90d92239/41598_2021_98794_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/9309411a8b03/41598_2021_98794_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/0c68d4137b78/41598_2021_98794_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/93a6b8185c8f/41598_2021_98794_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d51a/8511035/b4ba187e8918/41598_2021_98794_Fig11_HTML.jpg

相似文献

1
Deploying and scaling distributed parallel deep neural networks on the Tianhe-3 prototype system.在天河三号原型系统上部署和扩展分布式并行深度神经网络。
Sci Rep. 2021 Oct 12;11(1):20244. doi: 10.1038/s41598-021-98794-z.
2
Dynamic Allocation Method of Economic Information Integrated Data Based on Deep Learning Algorithm.基于深度学习算法的经济信息集成数据动态分配方法
Comput Intell Neurosci. 2022 May 16;2022:5494123. doi: 10.1155/2022/5494123. eCollection 2022.
3
A Partition Based Gradient Compression Algorithm for Distributed Training in AIoT.面向 AIoT 分布式训练的基于分区的梯度压缩算法。
Sensors (Basel). 2021 Mar 10;21(6):1943. doi: 10.3390/s21061943.
4
A Parameter Communication Optimization Strategy for Distributed Machine Learning in Sensors.一种用于传感器中分布式机器学习的参数通信优化策略。
Sensors (Basel). 2017 Sep 21;17(10):2172. doi: 10.3390/s17102172.
5
Multi-task learning for the prediction of wind power ramp events with deep neural networks.基于深度神经网络的风力发电功率骤降事件预测的多任务学习。
Neural Netw. 2020 Mar;123:401-411. doi: 10.1016/j.neunet.2019.12.017. Epub 2020 Jan 7.
6
An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer.天河二号超级计算机上的生物医学大数据处理接口。
Molecules. 2017 Dec 1;22(12):2116. doi: 10.3390/molecules22122116.
7
Parareal Neural Networks Emulating a Parallel-in-Time Algorithm.
IEEE Trans Neural Netw Learn Syst. 2024 May;35(5):6353-6364. doi: 10.1109/TNNLS.2022.3206797. Epub 2024 May 2.
8
A Heterogeneous RISC-V Processor for Efficient DNN Application in Smart Sensing System.一种用于智能传感系统中高效 DNN 应用的异构 RISC-V 处理器。
Sensors (Basel). 2021 Sep 28;21(19):6491. doi: 10.3390/s21196491.
9
Multistructure-Based Collaborative Online Distillation.基于多结构的协作式在线蒸馏
Entropy (Basel). 2019 Apr 2;21(4):357. doi: 10.3390/e21040357.
10
Deep-learning-based automatic computer-aided diagnosis system for diabetic retinopathy.基于深度学习的糖尿病视网膜病变自动计算机辅助诊断系统。
Biomed Eng Lett. 2017 Aug 31;8(1):41-57. doi: 10.1007/s13534-017-0047-y. eCollection 2018 Feb.

引用本文的文献

1
Distributed search and fusion for wine label image retrieval.用于葡萄酒标签图像检索的分布式搜索与融合
PeerJ Comput Sci. 2022 Sep 28;8:e1116. doi: 10.7717/peerj-cs.1116. eCollection 2022.

本文引用的文献

1
High-Scalable Collaborated Parallel Framework for Large-Scale Molecular Dynamic Simulation on Tianhe-2 Supercomputer.基于天河二号超级计算机的大规模分子动力学模拟的可扩展协同并行框架
IEEE/ACM Trans Comput Biol Bioinform. 2020 May-Jun;17(3):804-816. doi: 10.1109/TCBB.2018.2805709. Epub 2018 Feb 13.
2
Dermatologist-level classification of skin cancer with deep neural networks.基于深度神经网络的皮肤癌皮肤科医生级分类。
Nature. 2017 Feb 2;542(7639):115-118. doi: 10.1038/nature21056. Epub 2017 Jan 25.
3
Convolutional networks for fast, energy-efficient neuromorphic computing.
用于快速、节能神经形态计算的卷积网络。
Proc Natl Acad Sci U S A. 2016 Oct 11;113(41):11441-11446. doi: 10.1073/pnas.1604850113. Epub 2016 Sep 20.
4
Mastering the game of Go with deep neural networks and tree search.用深度神经网络和树搜索掌握围棋游戏。
Nature. 2016 Jan 28;529(7587):484-9. doi: 10.1038/nature16961.
5
Deep learning.深度学习。
Nature. 2015 May 28;521(7553):436-44. doi: 10.1038/nature14539.
6
More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server.通过陈旧同步并行参数服务器实现更高效的分布式机器学习
Adv Neural Inf Process Syst. 2013;2013:1223-1231.