Stocks Ryan, Palethorpe Elise, Barca Giuseppe M J
School of Computing, Australian National University, Canberra, ACT 2601, Australia.
School of Computing and Information Systems, Melbourne University, Melbourne, VIC 3052, Australia.
J Chem Theory Comput. 2024 Sep 10;20(17):7503-7515. doi: 10.1021/acs.jctc.4c00877. Epub 2024 Aug 27.
This article presents an optimized algorithm and implementation for calculating resolution-of-the-identity Hartree-Fock (RI-HF) energies and analytic gradients using multiple graphics processing units (GPUs). The algorithm is especially designed for high throughput ab initio molecular dynamics simulations of small and medium size molecules (10-100 atoms). Key innovations of this work include the exploitation of multi-GPU parallelism and a workload balancing scheme that efficiently distributes computational tasks among GPUs. Our implementation also employs techniques for symmetry utilization, integral screening, and leveraging sparsity to optimize memory usage. Computational results show that the implementation achieves significant performance improvements, including over 3 × speedups in single GPU AIMD throughput compared to previous GPU-accelerated RI-HF and traditional HF methods. Furthermore, utilizing multiple GPUs can provide superlinear speedup when the additional aggregate GPU memory allows for the storage of decompressed three-center integrals.
本文提出了一种优化算法及实现方法,用于使用多个图形处理单元(GPU)计算密度拟合哈特里 - 福克(RI - HF)能量和解析梯度。该算法专为中小尺寸分子(10 - 100个原子)的高通量从头算分子动力学模拟而设计。这项工作的关键创新包括利用多GPU并行性以及一种工作负载平衡方案,该方案能在GPU之间高效分配计算任务。我们的实现还采用了对称利用、积分筛选和利用稀疏性等技术来优化内存使用。计算结果表明,该实现取得了显著的性能提升,与之前的GPU加速RI - HF方法和传统HF方法相比,单GPU的AIMD吞吐量加速比超过3倍。此外,当额外的总GPU内存允许存储解压缩的三中心积分时,使用多个GPU可以提供超线性加速。