Department of Chemistry, Mellon College of Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States.
Acc Chem Res. 2021 Apr 6;54(7):1575-1585. doi: 10.1021/acs.accounts.0c00868. Epub 2021 Mar 13.
Machine learning interatomic potentials (MLIPs) are widely used for describing molecular energy and continue bridging the speed and accuracy gap between quantum mechanical (QM) and classical approaches like force fields. In this Account, we focus on the out-of-the-box approaches to developing transferable MLIPs for diverse chemical tasks. First, we introduce the "Accurate Neural Network engine for Molecular Energies," ANAKIN-ME, method (or ANI for short). The ANI model utilizes Justin Smith Symmetry Functions (JSSFs) and realizes training for vast data sets. The training data set of several orders of magnitude larger than before has become the key factor of the knowledge transferability and flexibility of MLIPs. As the quantity, quality, and types of interactions included in the training data set will dictate the accuracy of MLIPs, the task of proper data selection and model training could be assisted with advanced methods like active learning (AL), transfer learning (TL), and multitask learning (MTL).Next, we describe the AIMNet "Atoms-in-Molecules Network" that was inspired by the quantum theory of atoms in molecules. The AIMNet architecture lifts multiple limitations in MLIPs. It encodes long-range interactions and learnable representations of chemical elements. We also discuss the AIMNet-ME model that expands the applicability domain of AIMNet from neutral molecules toward open-shell systems. The AIMNet-ME encompasses a dependence of the potential on molecular charge and spin. It brings ML and physical models one step closer, ensuring the correct molecular energy behavior over the total molecular charge.We finally describe perhaps the simplest possible physics-aware model, which combines ML and the extended Hückel method. In ML-EHM, "Hierarchically Interacting Particle Neural Network," HIP-NN generates the set of a molecule- and environment-dependent Hamiltonian elements α and . As a test example, we show how in contrast to traditional Hückel theory, ML-EHM correctly describes orbital crossing with bond rotations. Hence it learns the underlying physics, highlighting that the inclusion of proper physical constraints and symmetries could significantly improve ML model generalization.
机器学习原子间势(MLIP)广泛用于描述分子能量,并继续缩小量子力学(QM)和力场等经典方法之间的速度和准确性差距。在本报告中,我们专注于开发用于各种化学任务的可转移 MLIP 的即用型方法。首先,我们介绍了“用于分子能量的精确神经网络引擎”,ANI-ME(或简称 ANI)方法。ANI 模型利用 Justin Smith 对称函数(JSSF)并实现了对大量数据集的训练。训练数据集的数量级增加成为了 MLIP 知识可转移性和灵活性的关键因素。由于训练数据集中包含的相互作用的数量、质量和类型将决定 MLIP 的准确性,因此适当的数据选择和模型训练任务可以通过先进的方法(如主动学习(AL)、迁移学习(TL)和多任务学习(MTL))来辅助完成。接下来,我们描述了受分子中原子量子理论启发的 AIMNet“分子中的原子网络”。AIMNet 架构消除了 MLIP 中的多个限制。它编码了长程相互作用和化学元素的可学习表示。我们还讨论了 AIMNet-ME 模型,该模型将 AIMNet 的适用域从中性分子扩展到开壳系统。AIMNet-ME 包括对分子电荷和自旋的势能依赖性。它使 ML 和物理模型更接近一步,确保了总分子电荷下分子能量的正确行为。我们最后描述了一种可能是最简单的物理感知模型,它结合了 ML 和扩展的 Hückel 方法。在 ML-EHM 中,“层次化相互作用粒子神经网络”,HIP-NN 生成一组与分子和环境相关的哈密顿元素 α 和 。作为一个测试示例,我们展示了与传统的 Hückel 理论相比,ML-EHM 如何正确描述轨道交叉与键旋转。因此,它学习了基础物理,强调了包含适当的物理约束和对称性可以显著提高 ML 模型的泛化能力。