Asadchev Andrey, Valeev Edward F
Department of Chemistry, Virginia Tech, Blacksburg, Virginia 24061, United States.
J Chem Theory Comput. 2023 Mar 28;19(6):1698-1710. doi: 10.1021/acs.jctc.2c00995. Epub 2023 Mar 14.
To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures, FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that are key for the evaluation of Coulomb and other 2-particle interactions in the density-fitting approximation, the use of multiquantal recurrences (in which multiple quanta are created or transferred at once) is shown to produce significant memory savings. Other innovations include leveraging register memory for reduced memory footprint and direct compile-time generation of optimized kernels (instead of custom code generation) with compile-time features of modern C++/CUDA. Performance of conventional and CUDA-based implementations of the proposed schemes is illustrated for both the individual batches of integrals involving up to Gaussians with low and high angular momenta (up to = 6) and contraction degrees, as well as for the density-fitting-based evaluation of the Coulomb potential. The computer implementation is available in the open-source LibintX library.
为提高在现代加速架构上高斯积分求值的效率,基于高效浮点运算(FLOP)的、以小原-斋贺算法为基础的递归求值方案针对内存占用进行了优化。对于在密度拟合近似中评估库仑及其他双粒子相互作用的关键三中心双粒子积分,使用多量子递归(即一次创建或转移多个量子)可显著节省内存。其他创新包括利用寄存器内存以减少内存占用,以及利用现代C++/CUDA的编译时特性直接在编译时生成优化内核(而非生成定制代码)。针对涉及低角动量和高角动量(最高(l = 6))以及收缩度的高达高斯函数的单个积分批次,以及基于密度拟合的库仑势评估,展示了所提方案的传统实现和基于CUDA的实现的性能。计算机实现可在开源的LibintX库中获取。