Smith J S, Isayev O, Roitberg A E
University of Florida , Department of Chemistry , PO Box 117200 , Gainesville , FL , USA 32611-7200 . Email:
University of North Carolina at Chapel Hill , Division of Chemical Biology and Medicinal Chemistry , UNC Eshelman School of Pharmacy , Chapel Hill , NC , USA 27599 . Email:
Chem Sci. 2017 Apr 1;8(4):3192-3203. doi: 10.1039/c6sc05720a. Epub 2017 Feb 8.
Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI for short. ANI is a new method designed with the intent of developing transferable neural network potentials that utilize a highly-modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors (AEV) as a molecular representation. AEVs provide the ability to train neural networks to data that spans both configurational and conformational space, a feat not previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically relevant sampling of molecular potential surfaces, we also proposed a Normal Mode Sampling (NMS) method for generating molecular conformations. Through a series of case studies, we show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems (up to 54 atoms) than those included in the training data set.
深度学习正在彻底改变科学技术的许多领域,尤其是图像、文本和语音识别。在本文中,我们展示了如何在量子力学(QM)密度泛函理论(DFT)计算上训练的深度神经网络(NN)能够学习有机分子的准确且可转移的势能。我们引入了ANAKIN-ME(用于分子能量的精确神经网络引擎),简称为ANI。ANI是一种新方法,旨在开发可转移的神经网络势能,它利用高度修改的贝赫勒和帕里内洛对称函数来构建单原子原子环境向量(AEV)作为分子表示。AEV使我们有能力将神经网络训练到跨越构型和构象空间的数据上,这是此前在这个规模上未曾实现的壮举。我们利用ANI构建了一个名为ANI-1的势能模型,它在GDB数据库的一个子集上进行训练,该子集包含多达8个重原子,以便预测含有四种原子类型(H、C、N和O)的有机分子的总能量。为了获得分子势能面的加速但物理上相关的采样,我们还提出了一种用于生成分子构象的简正模式采样(NMS)方法。通过一系列案例研究,我们表明,与训练数据集所包含的分子系统相比,在大得多的分子系统(多达54个原子)上进行的参考DFT计算表明,ANI-1在化学上是准确的。