Haidar Azzam, Bayraktar Harun, Tomov Stanimire, Dongarra Jack, Higham Nicholas J
NVIDIA, Santa Clara, CA, USA.
Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA.
Proc Math Phys Eng Sci. 2020 Nov;476(2243):20200110. doi: 10.1098/rspa.2020.0110. Epub 2020 Nov 25.
Double-precision floating-point arithmetic (FP64) has been the de facto standard for engineering and scientific simulations for several decades. Problem complexity and the sheer volume of data coming from various instruments and sensors motivate researchers to mix and match various approaches to optimize compute resources, including different levels of floating-point precision. In recent years, machine learning has motivated hardware support for half-precision floating-point arithmetic. A primary challenge in high-performance computing is to leverage reduced-precision and mixed-precision hardware. We show how the FP16/FP32 Tensor Cores on NVIDIA GPUs can be exploited to accelerate the solution of linear systems of equations = without sacrificing numerical stability. The techniques we employ include multiprecision LU factorization, the preconditioned generalized minimal residual algorithm (GMRES), and scaling and auto-adaptive rounding to avoid overflow. We also show how to efficiently handle systems with multiple right-hand sides. On the NVIDIA Quadro GV100 (Volta) GPU, we achieve a performance increase and 5× better energy efficiency versus the standard FP64 implementation while maintaining an FP64 level of numerical stability.
几十年来,双精度浮点运算(FP64)一直是工程和科学模拟的事实上的标准。问题的复杂性以及来自各种仪器和传感器的大量数据促使研究人员混合使用各种方法来优化计算资源,包括不同级别的浮点精度。近年来,机器学习推动了对半精度浮点运算的硬件支持。高性能计算中的一个主要挑战是利用低精度和混合精度硬件。我们展示了如何利用NVIDIA GPU上的FP16/FP32张量核心来加速线性方程组 = 的求解,同时不牺牲数值稳定性。我们采用的技术包括多精度LU分解、预处理广义最小残差算法(GMRES)以及缩放和自动自适应舍入以避免溢出。我们还展示了如何有效地处理具有多个右侧项的系统。在NVIDIA Quadro GV100(Volta)GPU上,与标准的FP64实现相比,我们实现了性能提升和5倍的能效提升,同时保持了FP64级别的数值稳定性。