Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California 91125, USA.
Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials, Department of Chemistry, University of Basel, Basel, Switzerland.
J Chem Phys. 2019 Apr 7;150(13):131103. doi: 10.1063/1.5088393.
We address the degree to which machine learning (ML) can be used to accurately and transferably predict post-Hartree-Fock correlation energies. Refined strategies for feature design and selection are presented, and the molecular-orbital-based machine learning (MOB-ML) method is applied to several test systems. Strikingly, for the second-order Møller-Plessett perturbation theory, coupled cluster with singles and doubles (CCSD), and CCSD with perturbative triples levels of theory, it is shown that the thermally accessible (350 K) potential energy surface for a single water molecule can be described to within 1 mhartree using a model that is trained from only a single reference calculation at a randomized geometry. To explore the breadth of chemical diversity that can be described, MOB-ML is also applied to a new dataset of thermalized (350 K) geometries of 7211 organic models with up to seven heavy atoms. In comparison with the previously reported Δ-ML method, MOB-ML is shown to reach chemical accuracy with threefold fewer training geometries. Finally, a transferability test in which models trained for seven-heavy-atom systems are used to predict energies for thirteen-heavy-atom systems reveals that MOB-ML reaches chemical accuracy with 36-fold fewer training calculations than Δ-ML (140 vs 5000 training calculations).
我们探讨了机器学习 (ML) 可以在多大程度上准确且可转移地预测哈特利-福克后相关能量。提出了改进的特征设计和选择策略,并将基于分子轨道的机器学习 (MOB-ML) 方法应用于几个测试系统。引人注目的是,对于二阶 Møller-Plessett 微扰理论、耦合簇单双激发 (CCSD) 和含微扰三激发项的 CCSD 理论,结果表明,仅使用随机几何结构的单个参考计算即可在 1 mhartree 内描述单个水分子的 350 K 热accessible(可及)势能面。为了探索可以描述的化学多样性的广度,还将 MOB-ML 应用于 7211 个有机模型的 350 K 热平衡几何形状的新数据集,这些模型最多包含七个重原子。与之前报道的 Δ-ML 方法相比,MOB-ML 仅使用三倍数量的训练几何形状即可达到化学精度。最后,在一个转移能力测试中,使用针对七重原子系统训练的模型来预测十三重原子系统的能量,结果表明,MOB-ML 仅使用 36 倍数量的训练计算即可达到化学精度,而 Δ-ML 则需要 5000 倍数量的训练计算(140 对 5000 次训练计算)。