Ryczko Kevin, Krogel Jaron T, Tamblyn Isaac
Good Chemistry Company, Vancouver, British ColumbiaV6E 4B1, Canada.
Materials Science and Technology Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee37831, United States.
J Chem Theory Comput. 2022 Dec 13;18(12):7695-7701. doi: 10.1021/acs.jctc.2c00483. Epub 2022 Nov 1.
We present two machine learning methodologies that are capable of predicting diffusion Monte Carlo (DMC) energies with small data sets (≈60 DMC calculations in total). The first uses voxel deep neural networks (VDNNs) to predict DMC energy densities using Kohn-Sham density functional theory (DFT) electron densities as input. The second uses kernel ridge regression (KRR) to predict atomic contributions to the DMC total energy using atomic environment vectors as input (we used atom-centered symmetry functions, atomic environment vectors from the ANI models, and smooth overlap of atomic positions). We first compare the methodologies on pristine graphene lattices, where we find that the KRR methodology performs best in comparison to gradient boosted decision trees, random forest, Gaussian process regression, and multilayer perceptrons. In addition, KRR outperforms VDNNs by an order of magnitude. Afterward, we study the generalizability of KRR to predict the energy barrier associated with a Stone-Wales defect. Lastly, we move from 2D to 3D materials and use KRR to predict total energies of liquid water. In all cases, we find that the KRR models are more accurate than Kohn-Sham DFT and all mean absolute errors are less than chemical accuracy.
我们提出了两种机器学习方法,它们能够利用小数据集(总共约60次扩散蒙特卡罗(DMC)计算)预测DMC能量。第一种方法使用体素深度神经网络(VDNN),以Kohn-Sham密度泛函理论(DFT)电子密度作为输入来预测DMC能量密度。第二种方法使用核岭回归(KRR),以原子环境向量作为输入来预测原子对DMC总能量的贡献(我们使用了以原子为中心的对称函数、来自ANI模型的原子环境向量以及原子位置的平滑重叠)。我们首先在原始石墨烯晶格上比较这些方法,发现在与梯度提升决策树、随机森林、高斯过程回归和多层感知器的比较中,KRR方法表现最佳。此外,KRR的性能比VDNN高出一个数量级。之后,我们研究KRR预测与斯通-威尔士缺陷相关的能垒的泛化能力。最后,我们从二维材料转向三维材料,并使用KRR预测液态水的总能量。在所有情况下,我们发现KRR模型比Kohn-Sham DFT更准确,并且所有平均绝对误差均小于化学精度。