在图形处理单元上计算电子结构理论中的密度矩阵

Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

作者信息

Cawkwell M J, Sanville E J, Mniszewski S M, Niklasson Anders M N

机构信息

Theoretical Division and ‡Computer, Computational, and Statistical Sciences Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States.

出版信息

J Chem Theory Comput. 2012 Nov 13;8(11):4094-101. doi: 10.1021/ct300442w. Epub 2012 Oct 8.

DOI:10.1021/ct300442w

PMID:26605576

Abstract

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.

摘要

对于密度矩阵，类似薛定谔方程的自洽解是基于量子的原子间键合模型中的关键且计算量大的步骤。历史上，这一步骤是通过哈密顿量的对角化来解决的。我们通过费米算子在一系列广义矩阵 - 矩阵乘法中的递归展开，研究了用于计算密度矩阵的二阶谱投影（SP2）算法的性能和准确性。我们证明，由于其简单性，SP2算法[Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115]非常适合在图形处理单元（GPU）上实现。当矩阵维度超过约2000×2000时，SP2算法的混合GPU/中央处理器（CPU）和全GPU实现的双精度和单精度算术性能超过了仅CPU实现的SP2算法和传统矩阵对角化的性能。详细描述了在GPU内存中分配的数组的填充方案，该方案优化了用于广义矩阵 - 矩阵乘法的三级BLAS DGEMM和SGEMM子例程的CUBLAS实现的性能。对混合CPU/GPU和全GPU实现的相对性能分析表明，GPU和CPU之间数组的传输仅占总计算时间的一小部分。使用SP2算法计算的自洽密度矩阵中测量的误差通常小于通过对角化计算的矩阵中测量的误差。此外，使用SP2算法计算的密度矩阵中的误差不表现出对系统大小的任何依赖性，而当采用对角化时，误差随轨道数线性增加。

相似文献

Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

J Chem Theory Comput. 2012 Nov 13;8(11):4094-101. doi: 10.1021/ct300442w. Epub 2012 Oct 8.

Computation of the Density Matrix in Electronic Structure Theory in Parallel on Multiple Graphics Processing Units.

J Chem Theory Comput. 2014 Dec 9;10(12):5391-6. doi: 10.1021/ct5008229.

Efficient parallel linear scaling construction of the density matrix for Born-Oppenheimer molecular dynamics.

J Chem Theory Comput. 2015 Oct 13;11(10):4644-54. doi: 10.1021/acs.jctc.5b00552. Epub 2015 Sep 29.

Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units and a Mixed Precision Matrix Multiplication Library.

J Chem Theory Comput. 2010 Jan 12;6(1):135-44. doi: 10.1021/ct900543q.

GPU algorithms for density matrix methods on MOPAC: linear scaling electronic structure calculations for large molecular systems.

J Mol Model. 2020 Oct 22;26(11):313. doi: 10.1007/s00894-020-04571-6.

Coupled Cluster Theory on Graphics Processing Units I. The Coupled Cluster Doubles Method.

J Chem Theory Comput. 2011 May 10;7(5):1287-95. doi: 10.1021/ct100584w. Epub 2011 Apr 15.

Stacked-Bloch-wave electron diffraction simulations using GPU acceleration.

Ultramicroscopy. 2014 Jun;141:32-7. doi: 10.1016/j.ultramic.2014.03.003. Epub 2014 Mar 17.

Graphics processing unit accelerated computation of digital holograms.

Appl Opt. 2009 Dec 1;48(34):H137-43. doi: 10.1364/AO.48.00H137.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.

Accelerating All-Atom Normal Mode Analysis with Graphics Processing Unit.

J Chem Theory Comput. 2011 Jun 14;7(6):1595-603. doi: 10.1021/ct100728k. Epub 2011 May 25.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

在图形处理单元上计算电子结构理论中的密度矩阵

Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

作者信息

机构信息

出版信息

相似文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献