Radova Mariia, Stark Wojciech G, Allen Connor S, Maurer Reinhard J, Bartók Albert P
Department of Chemistry, University of Warwick, Coventry, UK.
Department of Physics, University of Warwick, Coventry, UK.
NPJ Comput Mater. 2025;11(1):237. doi: 10.1038/s41524-025-01727-x. Epub 2025 Jul 18.
Machine-learned interatomic potentials are revolutionising atomistic materials simulations by providing accurate and scalable predictions within the scope covered by the training data. However, generation of an accurate and robust training data set remains a challenge, often requiring thousands of first-principles calculations to achieve high accuracy. Foundation models have started to emerge with the ambition to create universally applicable potentials across a wide range of materials. While foundation models can be robust and transferable, they do not yet achieve the accuracy required to predict reaction barriers, phase transitions, and material stability. This work demonstrates that foundation model potentials can reach chemical accuracy when fine-tuned using transfer learning with partially frozen weights and biases. For two challenging datasets on reactive chemistry at surfaces and stability and elastic properties of tertiary alloys, we show that frozen transfer learning with 10-20% of the data (hundreds of datapoints) achieves similar accuracies to models trained from scratch (on thousands of datapoints). Moreover, we show that an equally accurate, but significantly more efficient surrogate model can be built using the transfer learned potential as the ground truth. In combination, we present a simulation workflow for machine learning potentials that improves data efficiency and computational efficiency.
机器学习的原子间势正在彻底改变原子材料模拟,它能在训练数据涵盖的范围内提供准确且可扩展的预测。然而,生成准确且稳健的训练数据集仍然是一项挑战,通常需要数千次第一性原理计算才能实现高精度。基础模型已开始出现,其目标是创建适用于广泛材料的通用势。虽然基础模型可以是稳健且可转移的,但它们尚未达到预测反应势垒、相变和材料稳定性所需的精度。这项工作表明,通过使用部分冻结权重和偏差的迁移学习进行微调,基础模型势可以达到化学精度。对于关于表面反应化学以及三元合金稳定性和弹性性质的两个具有挑战性的数据集,我们表明,使用10% - 20%的数据(数百个数据点)进行冻结迁移学习,可实现与从头开始训练(数千个数据点)的模型相似的精度。此外,我们表明,使用迁移学习得到的势作为基准真值,可以构建一个同样准确但效率显著更高的替代模型。综合起来,我们提出了一种用于机器学习势的模拟工作流程,可提高数据效率和计算效率。