Simulation Sciences Branch, U.S. Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, USA.
J Chem Phys. 2017 Sep 21;147(11):114109. doi: 10.1063/1.5002655.
In this work, we investigate a block Jacobi-Davidson (J-D) variant suitable for sparse symmetric eigenproblems where a substantial number of extremal eigenvalues are desired (e.g., ground-state real-space quantum chemistry). Most J-D algorithm variations tend to slow down as the number of desired eigenpairs increases due to frequent orthogonalization against a growing list of solved eigenvectors. In our specification of block J-D, all of the steps of the algorithm are performed in clusters, including the linear solves, which allows us to greatly reduce computational effort with blocked matrix-vector multiplies. In addition, we move orthogonalization against locked eigenvectors and working eigenvectors outside of the inner loop but retain the single Ritz vector projection corresponding to the index of the correction vector. Furthermore, we minimize the computational effort by constraining the working subspace to the current vectors being updated and the latest set of corresponding correction vectors. Finally, we incorporate accuracy thresholds based on the precision required by the Fermi-Dirac distribution. The net result is a significant reduction in the computational effort against most previous block J-D implementations, especially as the number of wanted eigenpairs grows. We compare our approach with another robust implementation of block J-D (JDQMR) and the state-of-the-art Chebyshev filter subspace (CheFSI) method for various real-space density functional theory systems. Versus CheFSI, for first-row elements, our method yields competitive timings for valence-only systems and 4-6× speedups for all-electron systems with up to 10× reduced matrix-vector multiplies. For all-electron calculations on larger elements (e.g., gold) where the wanted spectrum is quite narrow compared to the full spectrum, we observe 60× speedup with 200× fewer matrix-vector multiples vs. CheFSI.
在这项工作中,我们研究了一种适合于稀疏对称特征问题的块状 Jacobi-Davidson(J-D)变体,该问题需要大量的极值特征值(例如,基态实空间量子化学)。由于需要频繁地对求解的特征向量列表进行正交化,大多数 J-D 算法变体随着所需特征对数量的增加而趋于变慢。在我们的块状 J-D 规范中,算法的所有步骤都在集群中执行,包括线性求解,这使我们能够通过分块矩阵-向量乘法大大减少计算工作量。此外,我们将对锁定特征向量和工作特征向量的正交化移到内部循环之外,但保留与校正向量索引相对应的单个 Ritz 向量投影。此外,我们通过将工作子空间限制为正在更新的当前向量和最新的一组相应校正向量来最小化计算工作量。最后,我们根据费米-狄拉克分布所需的精度来确定精度阈值。最终结果是与大多数以前的块状 J-D 实现相比,计算工作量显著减少,尤其是随着所需特征对数量的增加。我们将我们的方法与另一种块状 J-D(JDQMR)的稳健实现和最新的 Chebyshev 滤波器子空间(CheFSI)方法进行了比较,用于各种实空间密度泛函理论系统。与 CheFSI 相比,对于第一行元素,我们的方法对于仅价电子系统具有竞争力的计时,对于所有电子系统具有 4-6 倍的加速,并且矩阵-向量乘法减少了 10 倍。对于较大元素(例如金)的全电子计算,与 CheFSI 相比,我们观察到 60 倍的加速,矩阵-向量乘法减少了 200 倍。