Stocks Ryan, Palethorpe Elise, Barca Giuseppe M J
School of Computing, Australian National University, Canberra, ACT 2601, Australia.
J Chem Theory Comput. 2024 Mar 26;20(6):2505-2519. doi: 10.1021/acs.jctc.3c01424. Epub 2024 Mar 8.
This article presents a novel algorithm for the calculation of analytic energy gradients from second-order Møller-Plesset perturbation theory within the Resolution-of-the-Identity approximation (RI-MP2), which is designed to achieve high performance on clusters with multiple graphical processing units (GPUs). The algorithm uses GPUs for all major steps of the calculation, including integral generation, formation of all required intermediate tensors, solution of the Z-vector equation and gradient accumulation. The implementation in the EXtreme Scale Electronic Structure System (EXESS) software package includes a tailored, highly efficient, multistream scheduling system to hide CPU-GPU data transfer latencies and allows nodes with 8 A100 GPUs to operate at over 80% of theoretical peak floating-point performance. Comparative performance analysis shows a significant reduction in computational time relative to traditional multicore CPU-based methods, with our approach achieving up to a 95-fold speedup over the single-node performance of established software such as Q-Chem and ORCA. Additionally, we demonstrate that pairing our implementation with the molecular fragmentation framework in EXESS can drastically lower the computational scaling of RI-MP2 gradient calculations from quintic to subquadratic, enabling further substantial savings in runtime while retaining high numerical accuracy in the resulting gradients.
本文提出了一种新颖的算法,用于在单位分解近似(RI-MP2)下从二阶莫勒-普莱塞特微扰理论计算解析能量梯度,该算法旨在在具有多个图形处理单元(GPU)的集群上实现高性能。该算法在计算的所有主要步骤中都使用GPU,包括积分生成、形成所有所需的中间张量、求解Z向量方程以及梯度累积。在极端规模电子结构系统(EXESS)软件包中的实现包括一个定制的、高效的多流调度系统,以隐藏CPU-GPU数据传输延迟,并允许具有8个A100 GPU的节点以超过理论峰值浮点性能的80%运行。对比性能分析表明,相对于传统的基于多核CPU的方法,计算时间显著减少,我们的方法相对于Q-Chem和ORCA等成熟软件的单节点性能实现了高达95倍的加速。此外,我们证明将我们的实现与EXESS中的分子碎片框架相结合,可以将RI-MP2梯度计算的计算规模从五次方大幅降低到次二次方,从而在运行时进一步大幅节省时间,同时在所得梯度中保持高数值精度。