Microsoft Research AI4Science, Beijing 100084, China.
College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China.
J Chem Phys. 2023 Jul 21;159(3). doi: 10.1063/5.0147023.
Machine learning force fields (MLFFs) have gained popularity in recent years as they provide a cost-effective alternative to ab initio molecular dynamics (MD) simulations. Despite a small error on the test set, MLFFs inherently suffer from generalization and robustness issues during MD simulations. To alleviate these issues, we propose global force metrics and fine-grained metrics from element and conformation aspects to systematically measure MLFFs for every atom and every conformation of molecules. We selected three state-of-the-art MLFFs (ET, NequIP, and ViSNet) and comprehensively evaluated on aspirin, Ac-Ala3-NHMe, and Chignolin MD datasets with the number of atoms ranging from 21 to 166. Driven by the trained MLFFs on these molecules, we performed MD simulations from different initial conformations, analyzed the relationship between the force metrics and the stability of simulation trajectories, and investigated the reason for collapsed simulations. Finally, the performance of MLFFs and the stability of MD simulations can be further improved guided by the proposed force metrics for model training, specifically training MLFF models with these force metrics as loss functions, fine-tuning by reweighting samples in the original dataset, and continued training by recruiting additional unexplored data.
机器学习力场(MLFFs)近年来越来越受欢迎,因为它们为从头分子动力学(MD)模拟提供了一种具有成本效益的替代方案。尽管在测试集上存在较小的误差,但 MLFF 在 MD 模拟过程中固有地存在泛化和稳健性问题。为了缓解这些问题,我们从元素和构象方面提出了全局力指标和细粒度指标,以系统地测量每个原子和每个分子构象的 MLFF。我们选择了三种最先进的 MLFF(ET、NequIP 和 ViSNet),并在原子数从 21 到 166 的阿司匹林、Ac-Ala3-NHMe 和 Chignolin MD 数据集上进行了全面评估。受这些分子上训练有素的 MLFF 的驱动,我们从不同的初始构象进行了 MD 模拟,分析了力指标与模拟轨迹稳定性之间的关系,并研究了模拟崩溃的原因。最后,通过提出的力指标指导模型训练,可以进一步提高 MLFF 的性能和 MD 模拟的稳定性,特别是将这些力指标用作损失函数来训练 MLFF 模型,通过重新加权原始数据集中的样本进行微调,以及通过招募额外的未探索数据进行持续训练。