Department of Chemistry and Ames Laboratory, Iowa State University, Ames, Iowa 50011, United States.
J Chem Theory Comput. 2022 Apr 12;18(4):2144-2161. doi: 10.1021/acs.jctc.1c00820. Epub 2022 Apr 4.
In recent years, parallelism via multithreading has become extremely important to the optimization of high-performance electronic structure theory codes. Such multithreading is generally achieved via OpenMP constructs, using a fork-join threading model to enable thread-level data parallelism within the code. An alternative approach to multithreading is , which displays multiple benefits relative to fork-join thread parallelism. A novel Restricted Hartree-Fock (RHF) algorithm, utilizing task-based parallelism to achieve optimal performance, was developed and implemented into the JuliaChem electronic structure theory software package. The new RHF algorithm utilizes a unique method of shell quartet batch creation, enabling construction and distribution of fine-grained shell quartet batches in a load-balanced manner using the Julia task construct. These shell quartet batches are then distributed statically across message-passing interface (MPI) ranks and dynamically across threads within an MPI rank, requiring no explicit inter-rank or interthread synchronization to do so. Compared to the hybrid MPI/OpenMP RHF algorithm present in the GAMESS software package, the task-based algorithm demonstrates speedups of up to ∼40% for systems in the S22(3) test set of molecules, with system sizes up to ∼1000 basis functions. The JuliaChem algorithm demonstrates the viability of both the task-based parallelism model and the Julia programming language for construction of performant electronic structure theory codes targeting systems of a size of chemical interest.
近年来,通过多线程实现并行处理对于高性能电子结构理论代码的优化变得非常重要。这种多线程通常通过 OpenMP 结构来实现,使用叉分-合并线程模型在代码中实现线程级数据并行。多线程的另一种方法是任务并行,与叉分-合并线程并行相比具有多种优势。一种新的受限哈特利-福克(RHF)算法,利用基于任务的并行化来实现最佳性能,被开发并实现到 JuliaChem 电子结构理论软件包中。新的 RHF 算法利用了一种独特的壳 quartet 批处理创建方法,能够使用 Julia 任务结构以负载均衡的方式构建和分发细粒度的壳 quartet 批处理。然后,这些壳 quartet 批处理在消息传递接口 (MPI) 等级之间静态分配,并在 MPI 等级内的线程之间动态分配,无需显式的等级间或线程间同步来实现。与 GAMESS 软件包中现有的混合 MPI/OpenMP RHF 算法相比,基于任务的算法在 S22(3)分子测试集的系统中展示了高达约 40%的加速,系统大小高达约 1000 个基函数。JuliaChem 算法展示了基于任务的并行模型和 Julia 编程语言在构建针对化学感兴趣的系统的高性能电子结构理论代码方面的可行性。