Focassio Bruno, M Freitas Luis Paulo, Schleder Gabriel R
Brazilian Nanotechnology National Laboratory (LNNano/CNPEM), Campinas 13083-100, São Paulo, Brazil.
John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts 02138, United States.
ACS Appl Mater Interfaces. 2025 Mar 5;17(9):13111-13121. doi: 10.1021/acsami.4c03815. Epub 2024 Jul 11.
Machine learning interatomic potentials (MLIPs) are one of the main techniques in the materials science toolbox, able to bridge accuracy with the computational efficiency of classical force fields. This allows simulations ranging from atoms, molecules, and biosystems, to solid and bulk materials, surfaces, nanomaterials, and their interfaces and complex interactions. A recent class of advanced MLIPs, which use equivariant representations and deep graph neural networks, is known as universal models. These models are proposed as foundation models suitable for any system, covering most elements from the periodic table. Current universal MLIPs (UIPs) have been trained with the largest consistent data set available nowadays. However, these are composed mostly of bulk materials' DFT calculations. In this article, we assess the universality of all openly available UIPs, namely MACE, CHGNet, and M3GNet, in a representative task of generalization: calculation of surface energies. We find that the out-of-the-box foundation models have significant shortcomings in this task, with errors correlated to the total energy of surface simulations, having an out-of-domain distance from the training data set. Our results show that while UIPs are an efficient starting point for fine-tuning specialized models, we envision the potential of increasing the coverage of the materials space toward universal training data sets for MLIPs.
机器学习原子间势(MLIPs)是材料科学工具箱中的主要技术之一,能够在精度与经典力场的计算效率之间架起桥梁。这使得模拟范围涵盖从原子、分子和生物系统到固体和块状材料、表面、纳米材料及其界面和复杂相互作用。最近一类使用等变表示和深度图神经网络的先进MLIPs被称为通用模型。这些模型被提议作为适用于任何系统的基础模型,涵盖元素周期表中的大多数元素。当前的通用MLIPs(UIPs)是使用如今可用的最大的一致数据集进行训练的。然而,这些数据集大多由块状材料的密度泛函理论(DFT)计算组成。在本文中,我们在一个代表性的泛化任务——表面能计算中,评估了所有公开可用的UIPs,即MACE、CHGNet和M3GNet的通用性。我们发现,开箱即用的基础模型在这项任务中有显著缺点,其误差与表面模拟的总能量相关,与训练数据集存在域外距离。我们的结果表明,虽然UIPs是微调专门模型的有效起点,但我们设想了扩大材料空间覆盖范围以建立用于MLIPs的通用训练数据集的潜力。