Department of Chemistry, University of Southern California, Los Angeles, California, 900890482.
J Comput Chem. 2017 Apr 30;38(11):842-853. doi: 10.1002/jcc.24713.
A new hardware-agnostic contraction algorithm for tensors of arbitrary symmetry and sparsity is presented. The algorithm is implemented as a stand-alone open-source code libxm. This code is also integrated with general tensor library libtensor and with the Q-Chem quantum-chemistry package. An overview of the algorithm, its implementation, and benchmarks are presented. Similarly to other tensor software, the algorithm exploits efficient matrix multiplication libraries and assumes that tensors are stored in a block-tensor form. The distinguishing features of the algorithm are: (i) efficient repackaging of the individual blocks into large matrices and back, which affords efficient graphics processing unit (GPU)-enabled calculations without modifications of higher-level codes; (ii) fully asynchronous data transfer between disk storage and fast memory. The algorithm enables canonical all-electron coupled-cluster and equation-of-motion coupled-cluster calculations with single and double substitutions (CCSD and EOM-CCSD) with over 1000 basis functions on a single quad-GPU machine. We show that the algorithm exhibits predicted theoretical scaling for canonical CCSD calculations, O(N ), irrespective of the data size on disk. © 2017 Wiley Periodicals, Inc.
本文提出了一种新的硬件无关张量收缩算法,适用于任意对称和稀疏张量。该算法实现为一个独立的开源代码库 libxm。该代码还与通用张量库 libtensor 和量子化学软件包 Q-Chem 集成。本文介绍了该算法的概述、实现和基准测试。与其他张量软件类似,该算法利用高效的矩阵乘法库,并假设张量以块张量的形式存储。该算法的特点包括:(i)高效地将各个块重新打包成大型矩阵,然后再还原,从而在不修改高层代码的情况下,实现高效的图形处理单元 (GPU) 计算;(ii)在磁盘存储和快速内存之间实现完全异步的数据传输。该算法能够在单个四核 GPU 机器上进行全电子耦合簇和运动方程耦合簇单双取代计算 (CCSD 和 EOM-CCSD),使用超过 1000 个基函数。我们表明,该算法对于标准 CCSD 计算表现出了预测的理论缩放性,即 O(N),而与磁盘上的数据大小无关。 © 2017 年 Wiley 期刊出版社