Department of Chemistry, University of Leuven, Celestijnenlaan 200F, Heverlee B-3001, Belgium.
J Comput Chem. 2013 Aug 15;34(22):1937-48. doi: 10.1002/jcc.23342. Epub 2013 Jun 7.
In this work, we present a parallel approach to complete and restricted active space second-order perturbation theory, (CASPT2/RASPT2). We also make an assessment of the performance characteristics of its particular implementation in the Molcas quantum chemistry programming package. Parallel scaling is limited by memory and I/O bandwidth instead of available cores. Significant time savings for calculations on large and complex systems can be achieved by increasing the number of processes on a single machine, as long as memory bandwidth allows, or by using multiple nodes with a fast, low-latency interconnect. We found that parallel efficiency drops below 50% when using 8-16 cores on the shared-memory architecture, or 16-32 nodes on the distributed-memory architecture, depending on the calculation. This limits the scalability of the implementation to a moderate amount of processes. Nonetheless, calculations that took more than 3 days on a serial machine could be performed in less than 5 h on an InfiniBand cluster, where the individual nodes were not even capable of running the calculation because of memory and I/O requirements. This ensures the continuing study of larger molecular systems by means of CASPT2/RASPT2 through the use of the aggregated computational resources offered by distributed computing systems.
在这项工作中,我们提出了一种完整的和受限的活性空间二阶微扰理论(CASPT2/RASPT2)的并行方法。我们还评估了其在 Molcas 量子化学编程包中的特定实现的性能特征。并行扩展受到内存和 I/O 带宽的限制,而不是可用的核心数量。通过在单台机器上增加进程数,只要内存带宽允许,或者使用具有快速、低延迟互连的多个节点,就可以为大型和复杂系统的计算节省大量时间。我们发现,在共享内存体系结构上使用 8-16 个核,或者在分布式内存体系结构上使用 16-32 个节点时,并行效率会下降到 50%以下,具体取决于计算。这限制了实现的可扩展性,只能使用适量的进程。尽管如此,在串行机器上需要超过 3 天才能完成的计算,在 InfiniBand 集群上不到 5 小时即可完成,因为单个节点由于内存和 I/O 要求甚至无法运行计算。这确保了通过使用分布式计算系统提供的聚合计算资源,继续通过 CASPT2/RASPT2 研究更大的分子系统。