Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee 37996, USA.
Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, USA.
J Chem Phys. 2017 Nov 7;147(17):174107. doi: 10.1063/1.4998616.
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
在从头算量子蒙特卡罗模拟中,对于大型系统,主要的数值成本是在试探波函数中计算 Slater 行列式的值。每个蒙特卡罗步骤都需要找到密集矩阵的行列式。这通常使用秩-1 Sherman-Morrison 更新方案迭代评估,以避免重复显式计算逆矩阵。因此,总的计算成本在形式上是电子数或矩阵大小的立方。为了提高该过程的数值效率,我们提出了一种新的多阶延迟更新方案。该策略通过将接受的移动应用于矩阵的延迟,直到预定数量的移动,K,来实现概率评估。然后,通过矩阵-矩阵操作而不是矩阵-向量操作,将接受的事件整块应用于矩阵,以提高算术强度和计算效率。该过程不会改变底层的蒙特卡罗采样或其统计效率。对于大型系统的计算和扩散蒙特卡罗等算法,接受率较高,在多核中央处理器和图形处理单元上都可以获得更新时间的数量级改进。