Department of Chemistry and Cherry L. Emerson Center for Scientific Computation, Emory University, Atlanta, Georgia 30322, USA.
Independent Researcher, Toronto, Canada.
J Chem Phys. 2022 Jun 28;156(24):240901. doi: 10.1063/5.0089200.
There has been great progress in developing methods for machine-learned potential energy surfaces. There have also been important assessments of these methods by comparing so-called learning curves on datasets of electronic energies and forces, notably the MD17 database. The dataset for each molecule in this database generally consists of tens of thousands of energies and forces obtained from DFT direct dynamics at 500 K. We contrast the datasets from this database for three "small" molecules, ethanol, malonaldehyde, and glycine, with datasets we have generated with specific targets for the potential energy surfaces (PESs) in mind: a rigorous calculation of the zero-point energy and wavefunction, the tunneling splitting in malonaldehyde, and, in the case of glycine, a description of all eight low-lying conformers. We found that the MD17 datasets are too limited for these targets. We also examine recent datasets for several PESs that describe small-molecule but complex chemical reactions. Finally, we introduce a new database, "QM-22," which contains datasets of molecules ranging from 4 to 15 atoms that extend to high energies and a large span of configurations.
在开发机器学习势能面的方法方面已经取得了很大进展。通过比较电子能和力数据集(尤其是 MD17 数据库)上的所谓学习曲线,这些方法也得到了重要的评估。该数据库中每个分子的数据集通常由从 500 K 的 DFT 直接动力学获得的数万能量和力组成。我们对比了来自该数据库的三个“小分子”(乙醇、丙二醛和甘氨酸)的数据集,以及我们为特定的势能面(PES)数据集生成的数据集:零能和波函数的严格计算、丙二醛的隧道分裂,以及甘氨酸的情况下,对所有八个低能构象的描述。我们发现 MD17 数据集对于这些目标来说太有限了。我们还研究了描述小分子但复杂化学反应的几个 PES 的最新数据集。最后,我们引入了一个新的数据库“QM-22”,其中包含从 4 到 15 个原子的分子数据集,这些数据集延伸到高能和大的构象范围。