Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824-1322, United States.
San Diego Supercomputer Center, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0505, United States.
J Chem Theory Comput. 2021 Jul 13;17(7):3955-3966. doi: 10.1021/acs.jctc.1c00145. Epub 2021 Jun 1.
We report a new multi-GPU capable Hartree-Fock/density functional theory implementation integrated into the open source QUantum Interaction Computational Kernel (QUICK) program. Details on the load balancing algorithms for electron repulsion integrals and exchange correlation quadrature across multiple GPUs are described. Benchmarking studies carried out on up to four GPU nodes, each containing four NVIDIA V100-SXM2 type GPUs demonstrate that our implementation is capable of achieving excellent load balancing and high parallel efficiency. For representative medium to large size protein/organic molecular systems, the observed parallel efficiencies remained above 82% for the Kohn-Sham matrix formation and above 90% for nuclear gradient calculations. The accelerations on NVIDIA A100, P100, and K80 platforms also have realized parallel efficiencies higher than 68% in all tested cases, paving the way for large-scale electronic structure calculations with QUICK.
我们报告了一种新的多 GPU 兼容的 Hartree-Fock/密度泛函理论实现,该实现集成到了开源 QUantum Interaction Computational Kernel(QUICK)程序中。本文详细介绍了在多个 GPU 之间进行电子排斥积分和交换相关积分的负载平衡算法。在多达四个 GPU 节点上进行的基准测试研究,每个节点包含四个 NVIDIA V100-SXM2 类型的 GPU,表明我们的实现能够实现出色的负载平衡和高并行效率。对于代表性的中等至大型蛋白质/有机分子系统,在 Kohn-Sham 矩阵形成和核梯度计算方面,观察到的并行效率均保持在 82%以上。在所有测试案例中,在 NVIDIA A100、P100 和 K80 平台上的加速也实现了高于 68%的并行效率,为使用 QUICK 进行大规模电子结构计算铺平了道路。