Schmitz Gunnar, Godtliebsen Ian Heide, Christiansen Ove
Department of Chemistry, Aarhus Universitet, DK-8000 Aarhus, Denmark.
J Chem Phys. 2019 Jun 28;150(24):244113. doi: 10.1063/1.5100141.
On the basis of a new extensive database constructed for the purpose, we assess various Machine Learning (ML) algorithms to predict energies in the framework of potential energy surface (PES) construction and discuss black box character, robustness, and efficiency. The database for training ML algorithms in energy predictions based on the molecular structure contains SCF, RI-MP2, RI-MP2-F12, and CCSD(F12)(T) data for around 10.5 × 10 configurations of 15 small molecules. The electronic energies as function of molecular structure are computed from both static and iteratively refined grids in the context of automized PES construction for anharmonic vibrational computations within the n-mode expansion. We explore the performance of a range of algorithms including Gaussian Process Regression (GPR), Kernel Ridge Regression, Support Vector Regression, and Neural Networks (NNs). We also explore methods related to GPR such as sparse Gaussian Process Regression, Gaussian process Markov Chains, and Sparse Gaussian Process Markov Chains. For NNs, we report some explorations of architecture, activation functions, and numerical settings. Different delta-learning strategies are considered, and the use of delta learning targeting CCSD(F12)(T) predictions using, for example, RI-MP2 combined with machine learned CCSD(F12)(T)-RI-MP2 differences is found to be an attractive option.
基于为此目的构建的一个新的广泛数据库,我们评估了各种机器学习(ML)算法,以在势能面(PES)构建框架中预测能量,并讨论了黑箱特性、稳健性和效率。用于基于分子结构进行能量预测的ML算法训练的数据库包含15个小分子约10.5×10种构型的SCF、RI-MP2、RI-MP2-F12和CCSD(F12)(T)数据。在n模式展开中用于非谐振动计算的自动化PES构建背景下,从静态和迭代细化网格计算作为分子结构函数的电子能量。我们探索了一系列算法的性能,包括高斯过程回归(GPR)、核岭回归、支持向量回归和神经网络(NNs)。我们还探索了与GPR相关的方法,如稀疏高斯过程回归、高斯过程马尔可夫链和稀疏高斯过程马尔可夫链。对于神经网络,我们报告了一些关于架构、激活函数和数值设置的探索。考虑了不同的增量学习策略,发现使用例如RI-MP2结合机器学习的CCSD(F12)(T)-RI-MP2差异来针对CCSD(F12)(T)预测进行增量学习是一个有吸引力的选择。