Maia Julio Daniel Carvalho, Dos Anjos Formiga Cabral Lucidio, Rocha Gerd Bruno
Centro de Informática, Universidade Federal da Paraíba, João Pessoa, PB, CEP: 58055-000, Brazil.
Theoretical and Computational Biophysics Group - Beckman Institute, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
J Mol Model. 2020 Oct 22;26(11):313. doi: 10.1007/s00894-020-04571-6.
Purification of the density matrix methods should be employed when dealing with complex chemical systems containing many atoms. The running times for these methods scale linearly with the number of atoms if we consider the sparsity from the density matrix. Since the efficiency expected from those methods is closely tied to the underlying parallel implementations of the linear algebra operations (e.g., P = P × P), we proposed a central processing unit (CPU) and graphics processing unit (GPU) parallel matrix-matrix multiplication in SVBR (symmetrical variable block row) format for energy calculations through the SP2 algorithm. This algorithm was inserted in MOPAC's MOZYME method, using the original LMO Fock matrix assembly, and the atomic integral calculation implemented on it. Correctness and performance tests show that the implemented SP2 is accurate and fast, as the GPU is able to achieve speedups up to 40 times for a water cluster system with 42,312 orbitals running in one NVIDIA K40 GPU card compared to the single-threaded version. The GPU-accelerated SP2 algorithm using the MOZYME LMO framework enables the calculations of semiempirical wavefunction with stricter SCF criteria for localized charged molecular systems, as well as the single-point energies of molecules with more than 100.000 LMO orbitals in less than 1 h. Graphical abstract Parallel CPU and GPU purification algorithms for electronic structure calculations were implemented in MOPAC's MOZYME method. Some matrices in these calculations, e.g., electron density P, are compressed, and the developed linear algebra operations deal with non-zero entries only. We employed the NVIDIA/CUDA platform to develop GPU algorithms, and accelerations up to 40 times for larger systems were achieved.
在处理包含许多原子的复杂化学系统时,应采用密度矩阵方法的纯化。如果考虑密度矩阵的稀疏性,这些方法的运行时间与原子数呈线性比例关系。由于这些方法预期的效率与线性代数运算的底层并行实现紧密相关(例如,P = P×P),我们针对通过SP2算法进行能量计算,提出了一种以SVBR(对称可变块行)格式进行中央处理器(CPU)和图形处理器(GPU)并行矩阵 - 矩阵乘法的方法。该算法被插入到MOPAC的MOZYME方法中,使用原始的LMO福克矩阵组装,并在其上实现原子积分计算。正确性和性能测试表明,所实现的SP2是准确且快速且快速的,因为与单线程版本相比,对于在一块NVIDIA K40 GPU卡上运行的具有42,312个轨道的水簇系统,GPU能够实现高达40倍的加速。使用MOZYME LMO框架的GPU加速SP2算法能够以更严格的自洽场(SCF)标准计算局部带电分子系统的半经验波函数,以及在不到1小时内计算具有超过100,000个LMO轨道的分子的单点能量。图形摘要:在MOPAC的MOZYME方法中实现了用于电子结构计算的并行CPU和GPU纯化算法。这些计算中的一些矩阵,例如电子密度P,是压缩的,并且所开发的线性代数运算仅处理非零项。我们采用NVIDIA/CUDA平台开发GPU算法,对于更大的系统实现了高达40倍的加速。