• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于16位加速的流体模拟:通过将ShallowWaters.jl压缩为Float16在A64FX上实现近4倍加速

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.

作者信息

Klöwer Milan, Hatfield Sam, Croci Matteo, Düben Peter D, Palmer Tim N

机构信息

Atmospheric, Oceanic and Planetary Physics University of Oxford Oxford UK.

European Centre for Medium-Range Weather Forecasts Reading UK.

出版信息

J Adv Model Earth Syst. 2022 Feb;14(2):e2021MS002684. doi: 10.1029/2021MS002684. Epub 2022 Feb 11.

DOI:10.1029/2021MS002684
PMID:35866041
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9287017/
Abstract

Most Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.

摘要

大多数地球系统模拟是在传统的中央处理器上以64位双精度浮点数(Float64)运行的,不过在存在巨大不确定性的情况下对高精度计算的需求受到了质疑。目前世界上运算速度最快的超级计算机富岳(Fugaku)是基于A64FX微处理器的,该微处理器也支持16位低精度格式Float16。我们使用ShallowWaters.jl研究了A64FX上的Float16性能,ShallowWaters.jl是首个完全以16位运算运行的流体循环模型。该模型采用了一些技术来解决16位中的精度和动态范围问题。对精度要求极高的时间积分进行了增强,包括采用补偿求和以尽量减少舍入误差。这种补偿时间积分与16位和32位浮点数的混合精度一样精确,但速度更快。由于A64FX对非规格化数的支持效率不高,Float16中可用的范围非常有限,为6×10至65504。我们开发了分析数字格式Sherlogs.jl来记录模拟过程中的算术结果。然后,ShallowWaters.jl中的方程会被系统地重新缩放以适应Float16,使用了97%的可用可表示数字。因此,我们在A64FX上使用Float16进行基准测试,加速比高达3.8倍。加上补偿时间积分后,加速比可达3.6倍。尽管与大型地球系统模型相比,ShallowWaters.jl被简化了,但它共享了基本算法,因此表明16位计算确实是在现有硬件上加速地球系统模拟的一种有竞争力的方式。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/742953a07605/JAME-14-0-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/771b5acd73d4/JAME-14-0-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/07773a238d6e/JAME-14-0-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/ba89aa8debf1/JAME-14-0-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/68564574d48d/JAME-14-0-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/742953a07605/JAME-14-0-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/771b5acd73d4/JAME-14-0-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/07773a238d6e/JAME-14-0-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/ba89aa8debf1/JAME-14-0-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/68564574d48d/JAME-14-0-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9925/9287017/742953a07605/JAME-14-0-g002.jpg

相似文献

1
Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.基于16位加速的流体模拟:通过将ShallowWaters.jl压缩为Float16在A64FX上实现近4倍加速
J Adv Model Earth Syst. 2022 Feb;14(2):e2021MS002684. doi: 10.1029/2021MS002684. Epub 2022 Feb 11.
2
Number Formats, Error Mitigation, and Scope for 16-Bit Arithmetics in Weather and Climate Modeling Analyzed With a Shallow Water Model.利用浅水模型分析天气和气候建模中16位算术的数字格式、误差缓解及范围
J Adv Model Earth Syst. 2020 Oct;12(10):e2020MS002246. doi: 10.1029/2020MS002246. Epub 2020 Oct 14.
3
Porting and Optimizing BWA-MEM2 Using the Fujitsu A64FX Processor.使用富士通A64FX处理器移植和优化BWA-MEM2
IEEE/ACM Trans Comput Biol Bioinform. 2023 Sep-Oct;20(5):3139-3153. doi: 10.1109/TCBB.2023.3264514. Epub 2023 Oct 9.
4
Hybrid Precision Floating-Point (HPFP) Selection to Optimize Hardware-Constrained Accelerator for CNN Training.用于优化受硬件约束的CNN训练加速器的混合精度浮点(HPFP)选择
Sensors (Basel). 2024 Mar 27;24(7):2145. doi: 10.3390/s24072145.
5
Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic.当使用正算数时,深度神经网络中激活函数的快速逼近。
Sensors (Basel). 2020 Mar 10;20(5):1515. doi: 10.3390/s20051515.
6
Periodic orbits in chaotic systems simulated at low precision.低精度模拟混沌系统中的周期轨道。
Sci Rep. 2023 Jul 14;13(1):11410. doi: 10.1038/s41598-023-37004-4.
7
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations.随机舍入和降低精度定点算术在求解神经常微分方程中的应用。
Philos Trans A Math Phys Eng Sci. 2020 Mar 6;378(2166):20190052. doi: 10.1098/rsta.2019.0052. Epub 2020 Jan 20.
8
Routine Microsecond Molecular Dynamics Simulations with AMBER on GPUs. 1. Generalized Born.使用AMBER在GPU上进行常规微秒级分子动力学模拟。1. 广义玻恩模型
J Chem Theory Comput. 2012 May 8;8(5):1542-1555. doi: 10.1021/ct200909j. Epub 2012 Mar 26.
9
Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats.
Phys Rev E. 2022 Jul;106(1-2):015308. doi: 10.1103/PhysRevE.106.015308.
10
Mixed-precision weights network for field-programmable gate array.基于现场可编程门阵列的混合精度权重网络。
PLoS One. 2021 May 10;16(5):e0251329. doi: 10.1371/journal.pone.0251329. eCollection 2021.

引用本文的文献

1
Periodic orbits in chaotic systems simulated at low precision.低精度模拟混沌系统中的周期轨道。
Sci Rep. 2023 Jul 14;13(1):11410. doi: 10.1038/s41598-023-37004-4.
2
Mixed-Precision for Linear Solvers in Global Geophysical Flows.全球地球物理流中线性求解器的混合精度
J Adv Model Earth Syst. 2022 Sep;14(9):e2022MS003148. doi: 10.1029/2022MS003148. Epub 2022 Sep 10.

本文引用的文献

1
The digital revolution of Earth-system science.地球系统科学的数字革命。
Nat Comput Sci. 2021 Feb;1(2):104-113. doi: 10.1038/s43588-021-00023-0. Epub 2021 Feb 22.
2
Number Formats, Error Mitigation, and Scope for 16-Bit Arithmetics in Weather and Climate Modeling Analyzed With a Shallow Water Model.利用浅水模型分析天气和气候建模中16位算术的数字格式、误差缓解及范围
J Adv Model Earth Syst. 2020 Oct;12(10):e2020MS002246. doi: 10.1029/2020MS002246. Epub 2020 Oct 14.
3
Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations.
随机舍入和降低精度定点算术在求解神经常微分方程中的应用。
Philos Trans A Math Phys Eng Sci. 2020 Mar 6;378(2166):20190052. doi: 10.1098/rsta.2019.0052. Epub 2020 Jan 20.
4
Modelling: Build imprecise supercomputers.建模:构建不精确的超级计算机。
Nature. 2015 Oct 1;526(7571):32-3. doi: 10.1038/526032a.