稀疏线性系统求解器中精度降低的性能影响。

Zounon Mawussi, Higham Nicholas J, Lucas Craig, Tisseur Françoise

School of Mathematics, University of Manchester, Manchester, United Kingdom.

The Numerical Algorithms Group, Manchester, Greater Manchester, United Kingdom.

PeerJ Comput Sci. 2022 Jan 17;8:e778. doi: 10.7717/peerj-cs.778. eCollection 2022.

It is well established that reduced precision arithmetic can be exploited to accelerate the solution of dense linear systems. Typical examples are mixed precision algorithms that reduce the execution time and the energy consumption of parallel solvers for dense linear systems by factorizing a matrix at a precision lower than the working precision. Much less is known about the efficiency of reduced precision in parallel solvers for sparse linear systems, and existing work focuses on single core experiments. We evaluate the benefits of using single precision arithmetic in solving a double precision sparse linear system using multiple cores. We consider both direct methods and iterative methods and we focus on using single precision for the key components of LU factorization and matrix-vector products. Our results show that the anticipated speedup of 2 over a double precision LU factorization is obtained only for the very largest of our test problems. We point out two key factors underlying the poor speedup. First, we find that single precision sparse LU factorization is prone to a severe loss of performance due to the intrusion of subnormal numbers. We identify a mechanism that allows cascading fill-ins to generate subnormal numbers and show that automatically flushing subnormals to zero avoids the performance penalties. The second factor is the lack of parallelism in the analysis and reordering phases of the solvers and the absence of floating-point arithmetic in these phases. For iterative solvers, we find that for the majority of the matrices computing or applying incomplete factorization preconditioners in single precision provides at best modest performance benefits compared with the use of double precision. We also find that using single precision for the matrix-vector product kernels provides an average speedup of 1.5 over double precision kernels. In both cases some form of refinement is needed to raise the single precision results to double precision accuracy, which will reduce performance gains.

众所周知，降低精度的算术运算可用于加速密集线性系统的求解。典型的例子是混合精度算法，该算法通过以低于工作精度的精度对矩阵进行因式分解，来减少密集线性系统并行求解器的执行时间和能耗。对于稀疏线性系统的并行求解器中降低精度的效率，人们了解得要少得多，并且现有的工作主要集中在单核实验上。我们评估了在使用多核求解双精度稀疏线性系统时使用单精度算术运算的好处。我们同时考虑了直接方法和迭代方法，并专注于在LU分解和矩阵向量乘积的关键组件中使用单精度。我们的结果表明，只有在处理非常大的测试问题时，才能获得比双精度LU分解预期快两倍的加速比。我们指出了加速比不佳的两个关键因素。首先，我们发现单精度稀疏LU分解由于次正规数的侵入而容易出现严重的性能损失。我们识别出一种机制，该机制允许级联填充生成次正规数，并表明自动将次正规数舍入为零可避免性能损失。第二个因素是求解器的分析和重排序阶段缺乏并行性，并且这些阶段没有浮点运算。对于迭代求解器，我们发现对于大多数矩阵，与使用双精度相比，以单精度计算或应用不完全分解预条件器最多只能带来适度的性能提升。我们还发现，对于矩阵向量乘积内核使用单精度比使用双精度平均能加速1.5倍。在这两种情况下，都需要某种形式的细化将单精度结果提高到双精度精度，这将降低性能提升。

相似文献

Performance impact of precision reduction in sparse linear systems solvers.

PeerJ Comput Sci. 2022 Jan 17;8:e778. doi: 10.7717/peerj-cs.778. eCollection 2022.

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems.

Proc Math Phys Eng Sci. 2020 Nov;476(2243):20200110. doi: 10.1098/rspa.2020.0110. Epub 2020 Nov 25.

Fixed-point iterative linear inverse solver with extended precision.

Sci Rep. 2023 Mar 30;13(1):5198. doi: 10.1038/s41598-023-32338-5.

Performance of preconditioned iterative linear solvers for cardiovascular simulations in rigid and deformable vessels.

Comput Mech. 2019 Sep 15;64:717-739. doi: 10.1007/s00466-019-01678-3. Epub 2019 Feb 6.

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs.

J Chem Theory Comput. 2024 Aug 13. doi: 10.1021/acs.jctc.4c00584.

Arnoldi preconditioning for solving large linear biomedical systems.

Conf Proc IEEE Eng Med Biol Soc. 2005;2005:2914-7. doi: 10.1109/IEMBS.2005.1617084.

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations.

Philos Trans A Math Phys Eng Sci. 2020 Mar 6;378(2166):20190052. doi: 10.1098/rsta.2019.0052. Epub 2020 Jan 20.

Mixed-Precision for Linear Solvers in Global Geophysical Flows.

J Adv Model Earth Syst. 2022 Sep;14(9):e2022MS003148. doi: 10.1029/2022MS003148. Epub 2022 Sep 10.

Effect of choices of boundary conditions on the numerical efficiency of direct solutions of finite difference frequency domain systems with perfectly matched layers.

Opt Express. 2022 Jul 18;30(15):26794-26806. doi: 10.1364/OE.457233.

Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems.

Philos Trans A Math Phys Eng Sci. 2014 Jun 28;372(2018):20130279. doi: 10.1098/rsta.2013.0279.

本文引用的文献

Numerical behavior of NVIDIA tensor cores.

PeerJ Comput Sci. 2021 Feb 10;7:e330. doi: 10.7717/peerj-cs.330. eCollection 2021.

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems.

Proc Math Phys Eng Sci. 2020 Nov;476(2243):20200110. doi: 10.1098/rspa.2020.0110. Epub 2020 Nov 25.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Performance impact of precision reduction in sparse linear systems solvers.

PeerJ Comput Sci. 2022 Jan 17;8:e778. doi: 10.7717/peerj-cs.778. eCollection 2022.

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems.

Proc Math Phys Eng Sci. 2020 Nov;476(2243):20200110. doi: 10.1098/rspa.2020.0110. Epub 2020 Nov 25.

Fixed-point iterative linear inverse solver with extended precision.

Sci Rep. 2023 Mar 30;13(1):5198. doi: 10.1038/s41598-023-32338-5.

Performance of preconditioned iterative linear solvers for cardiovascular simulations in rigid and deformable vessels.

Comput Mech. 2019 Sep 15;64:717-739. doi: 10.1007/s00466-019-01678-3. Epub 2019 Feb 6.

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs.

J Chem Theory Comput. 2024 Aug 13. doi: 10.1021/acs.jctc.4c00584.

Arnoldi preconditioning for solving large linear biomedical systems.

Conf Proc IEEE Eng Med Biol Soc. 2005;2005:2914-7. doi: 10.1109/IEMBS.2005.1617084.

Stochastic rounding and reduced-precision fixed-point arithmetic for solving neural ordinary differential equations.

Philos Trans A Math Phys Eng Sci. 2020 Mar 6;378(2166):20190052. doi: 10.1098/rsta.2019.0052. Epub 2020 Jan 20.

Mixed-Precision for Linear Solvers in Global Geophysical Flows.

J Adv Model Earth Syst. 2022 Sep;14(9):e2022MS003148. doi: 10.1029/2022MS003148. Epub 2022 Sep 10.

Effect of choices of boundary conditions on the numerical efficiency of direct solutions of finite difference frequency domain systems with perfectly matched layers.

Opt Express. 2022 Jul 18;30(15):26794-26806. doi: 10.1364/OE.457233.

Improving the energy efficiency of sparse linear system solvers on multicore and manycore systems.

Philos Trans A Math Phys Eng Sci. 2014 Jun 28;372(2018):20130279. doi: 10.1098/rsta.2013.0279.

本文引用的文献

Numerical behavior of NVIDIA tensor cores.

PeerJ Comput Sci. 2021 Feb 10;7:e330. doi: 10.7717/peerj-cs.330. eCollection 2021.

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems.

Proc Math Phys Eng Sci. 2020 Nov;476(2243):20200110. doi: 10.1098/rspa.2020.0110. Epub 2020 Nov 25.

Performance impact of precision reduction in sparse linear systems solvers.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献