基于16位加速的流体模拟：通过将ShallowWaters.jl压缩为Float16在A64FX上实现近4倍加速

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.

作者信息

Klöwer Milan, Hatfield Sam, Croci Matteo, Düben Peter D, Palmer Tim N

机构信息

Atmospheric, Oceanic and Planetary Physics University of Oxford Oxford UK.

European Centre for Medium-Range Weather Forecasts Reading UK.

出版信息

J Adv Model Earth Syst. 2022 Feb;14(2):e2021MS002684. doi: 10.1029/2021MS002684. Epub 2022 Feb 11.

DOI:10.1029/2021MS002684

PMID:35866041

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9287017/

Abstract

Most Earth-system simulations run on conventional central processing units in 64-bit double precision floating-point numbers Float64, although the need for high-precision calculations in the presence of large uncertainties has been questioned. Fugaku, currently the world's fastest supercomputer, is based on A64FX microprocessors, which also support the 16-bit low-precision format Float16. We investigate the Float16 performance on A64FX with ShallowWaters.jl, the first fluid circulation model that runs entirely with 16-bit arithmetic. The model implements techniques that address precision and dynamic range issues in 16 bits. The precision-critical time integration is augmented to include compensated summation to minimize rounding errors. Such a compensated time integration is as precise but faster than mixed precision with 16 and 32-bit floats. As subnormals are inefficiently supported on A64FX the very limited range available in Float16 is 6 × 10 to 65,504. We develop the analysis-number format Sherlogs.jl to log the arithmetic results during the simulation. The equations in ShallowWaters.jl are then systematically rescaled to fit into Float16, using 97% of the available representable numbers. Consequently, we benchmark speedups of up to 3.8x on A64FX with Float16. Adding a compensated time integration, speedups reach up to 3.6x. Although ShallowWaters.jl is simplified compared to large Earth-system models, it shares essential algorithms and therefore shows that 16-bit calculations are indeed a competitive way to accelerate Earth-system simulations on available hardware.

摘要

大多数地球系统模拟是在传统的中央处理器上以64位双精度浮点数（Float64）运行的，不过在存在巨大不确定性的情况下对高精度计算的需求受到了质疑。目前世界上运算速度最快的超级计算机富岳（Fugaku）是基于A64FX微处理器的，该微处理器也支持16位低精度格式Float16。我们使用ShallowWaters.jl研究了A64FX上的Float16性能，ShallowWaters.jl是首个完全以16位运算运行的流体循环模型。该模型采用了一些技术来解决16位中的精度和动态范围问题。对精度要求极高的时间积分进行了增强，包括采用补偿求和以尽量减少舍入误差。这种补偿时间积分与16位和32位浮点数的混合精度一样精确，但速度更快。由于A64FX对非规格化数的支持效率不高，Float16中可用的范围非常有限，为6×10至65504。我们开发了分析数字格式Sherlogs.jl来记录模拟过程中的算术结果。然后，ShallowWaters.jl中的方程会被系统地重新缩放以适应Float16，使用了97%的可用可表示数字。因此，我们在A64FX上使用Float16进行基准测试，加速比高达3.8倍。加上补偿时间积分后，加速比可达3.6倍。尽管与大型地球系统模型相比，ShallowWaters.jl被简化了，但它共享了基本算法，因此表明16位计算确实是在现有硬件上加速地球系统模拟的一种有竞争力的方式。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

基于16位加速的流体模拟：通过将ShallowWaters.jl压缩为Float16在A64FX上实现近4倍加速

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

基于16位加速的流体模拟：通过将ShallowWaters.jl压缩为Float16在A64FX上实现近4倍加速

Fluid Simulations Accelerated With 16 Bits: Approaching 4x Speedup on A64FX by Squeezing ShallowWaters.jl Into Float16.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献