Suppr超能文献

用于涉及大量激发态的单激发计算的双缓冲异构CPU+GPU积分消化算法

Double-buffered, heterogeneous CPU + GPU integral digestion algorithm for single-excitation calculations involving a large number of excited states.

作者信息

Morrison Adrian F, Epifanovsky Evgeny, Herbert John M

机构信息

Department of Chemistry and Biochemistry, The Ohio State University, Columbus, Ohio.

Q-Chem Inc., Pleasanton, California.

出版信息

J Comput Chem. 2018 Oct 5;39(26):2173-2182. doi: 10.1002/jcc.25531. Epub 2018 Oct 3.

Abstract

The most widely used quantum-chemical models for excited states are single-excitation theories, a category that includes configuration interaction with single substitutions, time-dependent density functional theory, and also a recently developed ab initio exciton model. When a large number of excited states are desired, these calculations incur a significant bottleneck in the "digestion" step in which two-electron integrals are contracted with density or density-like matrices. We present an implementation that moves this step onto graphical processing units (GPUs), and introduce a double-buffer scheme that minimizes latency by computing integrals on the central processing units (CPUs) concurrently with their digestion on the GPUs. An automatic code generation scheme simplifies the implementation of high-performance GPU kernels. For the exciton model, which requires separate excited-state calculations on each electronically coupled chromophore, the heterogeneous implementation described here results in speedups of 2-6× versus a CPU-only implementation. For traditional time-dependent density functional theory calculations, we obtain speedups of up to 5× when a large number of excited states is computed. © 2018 Wiley Periodicals, Inc.

摘要

用于激发态的最广泛使用的量子化学模型是单激发理论,这一类别包括单取代组态相互作用、含时密度泛函理论,以及最近发展的从头算激子模型。当需要大量激发态时,这些计算在“消化”步骤中会产生显著瓶颈,在该步骤中,双电子积分要与密度或类似密度的矩阵进行缩并。我们提出一种实现方法,将此步骤转移到图形处理单元(GPU)上,并引入一种双缓冲方案,通过在中央处理器(CPU)上计算积分的同时在GPU上进行积分“消化”,从而将延迟降至最低。一种自动代码生成方案简化了高性能GPU内核的实现。对于激子模型,该模型需要对每个电子耦合发色团进行单独的激发态计算,此处描述的异构实现与仅使用CPU的实现相比,加速比为2至6倍。对于传统的含时密度泛函理论计算,当计算大量激发态时,我们可获得高达5倍的加速比。© 2018威利期刊公司。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验