Kahle Leonid, Minisini Benoit, Bui Tai, First Jeremy T, Buda Corneliu, Goldman Thomas, Wimmer Erich
Materials Design SARL, 42 avenue Verdier, 92120 Montrouge, France.
bp Exploration Operating Co. Ltd, Chertsey Road, Sunbury-on-Thames TW16 7LN, UK.
Phys Chem Chem Phys. 2024 Aug 28;26(34):22665-22680. doi: 10.1039/d4cp01980f.
Machine-learned potentials (MLPs) trained on data combine the computational efficiency of classical interatomic potentials with the accuracy and generality of the first-principles method used in the creation of the respective training set. In this work, we implement and train a MLP to obtain an accurate description of the potential energy surface and property predictions for organic compounds, as both single molecules and in the condensed phase. We devise a dual descriptor, based on the atomic cluster expansion (ACE), that couples an information-rich short-range description with a coarser long-range description that captures weak intermolecular interactions. We employ uncertainty-guided active learning for the training set generation, creating a dataset that is comparatively small for the breadth of application and consists of alcohols, alkanes, and an adipate. Utilizing that MLP, we calculate densities of those systems of varying chain lengths as a function of temperature, obtaining a discrepancy of less than 4% compared with experiment. Vibrational frequencies calculated with the MLP have a root mean square error of less than 1 THz compared to DFT. The heat capacities of condensed systems are within 11% of experimental findings, which is strong evidence that the dual descriptor provides an accurate framework for the prediction of both short-range intramolecular and long-range intermolecular interactions.
在数据上训练的机器学习势(MLP)将经典原子间势的计算效率与用于创建相应训练集的第一性原理方法的准确性和通用性相结合。在这项工作中,我们实现并训练了一个MLP,以获得对有机化合物势能面的准确描述以及对有机化合物(包括单分子和凝聚相)性质的预测。我们基于原子团簇展开(ACE)设计了一种双描述符,它将信息丰富的短程描述与捕捉弱分子间相互作用的较粗粒度的长程描述相结合。我们在训练集生成中采用不确定性引导的主动学习,创建了一个对于应用广度而言相对较小的数据集,该数据集由醇类、烷烃和一种己二酸酯组成。利用该MLP,我们计算了不同链长的那些体系的密度随温度的变化,与实验相比,得到的差异小于4%。与密度泛函理论(DFT)相比,用MLP计算的振动频率的均方根误差小于1太赫兹。凝聚体系的热容在实验结果的11%以内,这有力地证明了双描述符为预测短程分子内和长程分子间相互作用提供了一个准确的框架。