Department of Chemistry, The University of Chicago, Chicago, IL, 60637, USA.
Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL, 60637, USA.
Sci Data. 2024 Feb 20;11(1):222. doi: 10.1038/s41597-024-03019-3.
System specific neural force fields (NFFs) have gained popularity in computational chemistry. One of the most popular datasets as a bencharmk to develop NFF models is the MD17 dataset and its subsequent extension. These datasets comprise geometries from the equilibrium region of the ground electronic state potential energy surface, sampled from direct adiabatic dynamics. However, many chemical reactions involve significant molecular geometrical deformations, for example, bond breaking. Therefore, MD17 is inadequate to represent a chemical reaction. To address this limitation in MD17, we introduce a new dataset, called Extended Excited-state Molecular Dynamics (xxMD) dataset. The xxMD dataset involves geometries sampled from direct nonadiabatic dynamics, and the energies are computed at both multireference wavefunction theory and density functional theory. We show that the xxMD dataset involves diverse geometries which represent chemical reactions. Assessment of NFF models on xxMD dataset reveals significantly higher predictive errors than those reported for MD17 and its variants. This work underscores the challenges faced in crafting a generalizable NFF model with extrapolation capability.
系统特定的神经力场(NFF)在计算化学中越来越受欢迎。作为开发 NFF 模型的基准之一,最受欢迎的数据集之一是 MD17 数据集及其后续扩展。这些数据集包含从基电子态势能面的平衡区域中直接绝热动力学采样的几何形状。然而,许多化学反应涉及到显著的分子几何变形,例如键断裂。因此,MD17 不足以代表化学反应。为了解决 MD17 的这个局限性,我们引入了一个新的数据集,称为扩展激发态分子动力学(xxMD)数据集。xxMD 数据集涉及从直接非绝热动力学中采样的几何形状,并且能量是在多参考波函数理论和密度泛函理论上计算的。我们表明,xxMD 数据集涉及代表化学反应的各种几何形状。在 xxMD 数据集上评估 NFF 模型表明,预测误差明显高于 MD17 及其变体报告的误差。这项工作强调了在构建具有外推能力的通用 NFF 模型方面面临的挑战。