Luo Yu, Meziere Jason A, Samolyuk German D, Hart Gus L W, Daymond Mark R, Béland Laurent Karim
Department of Mechanical and Materials Engineering, Queen's University, Kingston, Ontario K7L 2N8, Canada.
Department of Physics, Brigham Young University, Provo, Utah 84602, United States.
J Chem Theory Comput. 2023 Oct 10;19(19):6848-6856. doi: 10.1021/acs.jctc.3c00488. Epub 2023 Sep 12.
Machine learning force fields (MLFFs) are an increasingly popular choice for atomistic simulations due to their high fidelity and improvable nature. Here we propose a hybrid small-cell approach that combines attributes of both offline and active learning to systematically expand a quantum-mechanical (QM) database while constructing MLFFs with increasing model complexity. Our MLFFs employ the moment tensor potential formalism. During this process, we quantitatively assessed the structural properties, elastic properties, dimer potential energies, melting temperatures, phase stability, point defect formation energies, point defect migration energies, free surface energies, and generalized stacking fault (GSF) energies of Zr as predicted by our MLFFs. Unsurprisingly, the model complexity has a positive correlation with prediction accuracy. We also find that the MLFFs were able to predict the properties of out-of-sample configurations without directly including these specific configurations in the training dataset. Additionally, we generated 100 MLFFs of high complexity (1513 parameters each) that reached different local optima during training. Their predictions cluster around the benchmark DFT values, but subtle physical features such as the location of local minima on the GSF energy surface are washed out by statistical noise.
由于机器学习力场(MLFFs)具有高保真度和可改进的特性,它们在原子模拟中越来越受欢迎。在这里,我们提出了一种混合小单元方法,该方法结合了离线学习和主动学习的属性,以系统地扩展量子力学(QM)数据库,同时构建模型复杂度不断增加的MLFFs。我们的MLFFs采用矩张量势形式。在此过程中,我们定量评估了由我们的MLFFs预测的Zr的结构性质、弹性性质、二聚体势能、熔化温度、相稳定性、点缺陷形成能、点缺陷迁移能、自由表面能和广义堆垛层错(GSF)能。不出所料,模型复杂度与预测精度呈正相关。我们还发现,MLFFs能够预测样本外构型的性质,而无需在训练数据集中直接包含这些特定构型。此外,我们生成了100个高复杂度的MLFFs(每个有1513个参数),它们在训练过程中达到了不同的局部最优。它们的预测聚集在基准DFT值周围,但诸如GSF能量表面上局部最小值的位置等微妙物理特征被统计噪声掩盖了。