Suppr超能文献

利用人工智能硬件和图形处理器在电子结构计算中对逆重叠矩阵进行高效混合精度矩阵分解

Efficient Mixed-Precision Matrix Factorization of the Inverse Overlap Matrix in Electronic Structure Calculations with AI-Hardware and GPUs.

作者信息

Habib Adela, Finkelstein Joshua, Niklasson Anders M N

机构信息

Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

出版信息

J Chem Theory Comput. 2024 Aug 13. doi: 10.1021/acs.jctc.4c00584.

Abstract

In recent years, a new kind of accelerated hardware has gained popularity in the artificial intelligence (AI) community which enables extremely high-performance tensor contractions in reduced precision for deep neural network calculations. In this article, we exploit Nvidia Tensor cores, a prototypical example of such AI-hardware, to develop a mixed precision approach for computing a dense matrix factorization of the inverse overlap matrix in electronic structure theory, . This factorization of , written as = , is used to transform the general matrix eigenvalue problem into a standard matrix eigenvalue problem. Here we present a mixed precision iterative refinement algorithm where is given recursively using matrix-matrix multiplications and can be computed with high performance on Tensor cores. To understand the performance and accuracy of Tensor cores, comparisons are made to GPU-only implementations in single and double precision. Additionally, we propose a nonparametric stopping criteria which is robust in the face of lower precision floating point operations. The algorithm is particularly useful when we have a good initial guess to , for example, from previous time steps in quantum-mechanical molecular dynamics simulations or from a previous iteration in a geometry optimization.

摘要

近年来,一种新型的加速硬件在人工智能(AI)领域中颇受欢迎,它能够以降低的精度实现极高性能的张量收缩,用于深度神经网络计算。在本文中,我们利用英伟达张量核(此类AI硬件的一个典型例子),开发了一种混合精度方法,用于计算电子结构理论中逆重叠矩阵的密集矩阵分解。这种 的分解形式为 = ,用于将一般矩阵特征值问题转化为标准矩阵特征值问题。这里我们提出一种混合精度迭代细化算法,其中 通过矩阵 - 矩阵乘法递归给出,并且可以在张量核上高效计算。为了理解张量核的性能和精度,我们将其与单精度和双精度下仅使用GPU的实现进行了比较。此外,我们提出了一种非参数停止准则,该准则在面对较低精度的浮点运算时具有鲁棒性。当我们对 有一个良好的初始猜测时,例如来自量子力学分子动力学模拟的先前时间步长或几何优化的先前迭代,该算法特别有用。

相似文献

3
Quantum-Based Molecular Dynamics Simulations Using Tensor Cores.基于张量核的量子分子动力学模拟。
J Chem Theory Comput. 2021 Oct 12;17(10):6180-6192. doi: 10.1021/acs.jctc.1c00726. Epub 2021 Oct 1.
4
Quantum Perturbation Theory Using Tensor Cores and a Deep Neural Network.使用张量核和深度神经网络的量子微扰理论。
J Chem Theory Comput. 2022 Jul 12;18(7):4255-4268. doi: 10.1021/acs.jctc.2c00274. Epub 2022 Jun 7.
6
Numerical behavior of NVIDIA tensor cores.英伟达张量核的数值行为。
PeerJ Comput Sci. 2021 Feb 10;7:e330. doi: 10.7717/peerj-cs.330. eCollection 2021.
7
Acceleration of Approximate Matrix Multiplications on GPUs.GPU 上近似矩阵乘法的加速
Entropy (Basel). 2023 Jul 27;25(8):1130. doi: 10.3390/e25081130.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验