Department of Chemistry, Boston College, Chestnut Hill, Massachusetts 02467, United States.
Department of Computer Science, San Francisco State University, San Francisco, California 94132, United States.
J Chem Theory Comput. 2022 Sep 13;18(9):5739-5754. doi: 10.1021/acs.jctc.2c00546. Epub 2022 Aug 8.
Gaussian process (GP) regression has been recently developed as an effective method in molecular geometry optimization. The prior mean function is one of the crucial parts of the GP. We design and validate two types of physically inspired prior mean functions: force-field-based priors and posterior-type priors. In this work, we implement a dual-level training (DLT) optimizer for the posterior-type priors. The DLT optimizers can be considered as a class of optimization algorithms that belong to the delta-machine learning paradigm but with several major differences compared to the previously proposed algorithms in the same paradigm. In the first level of the DLT, we incorporate the classical mechanical descriptions of the equilibrium geometries into the prior function, which enhances the performance of the GP optimizer as compared to the one using a constant (or zero) prior. In the second level, we utilize the surrogate potential energy surfaces (PESs), which incorporate the physics learned in the first-level training, as the prior function to refine the model performance further. We find that the force-field-based priors and posterior-type priors reduce the overall optimization steps by a factor of 2-3 when compared to the limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) optimizer as well as the constant-prior GP optimizer proposed in previous works. We also demonstrate the potential of recovering the real PESs with GP with a force-field prior. This work shows the importance of including domain knowledge as an ingredient in the GP, which offers a potentially robust learning model for molecular geometry optimization and for exploring molecular PESs.
高斯过程 (GP) 回归最近被开发为一种有效的分子几何优化方法。先验均值函数是 GP 的关键部分之一。我们设计并验证了两种基于物理的先验均值函数:基于力场的先验和后验型先验。在这项工作中,我们为后验型先验实现了一种两级训练 (DLT) 优化器。DLT 优化器可以被认为是一类属于 delta 机器学习范例的优化算法,但与同一范例中之前提出的算法有几个主要区别。在 DLT 的第一级,我们将平衡几何形状的经典力学描述纳入先验函数中,这与使用常数(或零)先验的 GP 优化器相比,提高了 GP 优化器的性能。在第二级,我们利用包含第一级训练中学习到的物理信息的替代势能表面 (PES) 作为先验函数,进一步改进模型性能。我们发现,与有限内存 Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) 优化器以及之前工作中提出的常数先验 GP 优化器相比,基于力场的先验和后验型先验可以将总优化步骤减少 2-3 倍。我们还展示了使用基于力场的先验的 GP 恢复真实 PES 的潜力。这项工作表明,将领域知识作为 GP 的一个组成部分的重要性,为分子几何优化和探索分子 PES 提供了一个潜在稳健的学习模型。