Amabilino Silvia, Bratholm Lars A, Bennie Simon J, Vaucher Alain C, Reiher Markus, Glowacki David R
School of Chemistry , University of Bristol , Bristol BS8 1TS , U.K.
Laboratory of Physical Chemistry , ETH Zurich , Zurich , Switzerland.
J Phys Chem A. 2019 May 23;123(20):4486-4499. doi: 10.1021/acs.jpca.9b01006. Epub 2019 Apr 18.
While the primary bottleneck to a number of computational workflows was not so long ago limited by processing power, the rise of machine learning technologies has resulted in an interesting paradigm shift, which places increasing value on issues related to data curation-that is, data size, quality, bias, format, and coverage. Increasingly, data-related issues are equally as important as the algorithmic methods used to process and learn from the data. Here we introduce an open-source graphics processing unit-accelerated neural network (NN) framework for learning reactive potential energy surfaces (PESs). To obtain training data for this NN framework, we investigate the use of real-time interactive ab initio molecular dynamics in virtual reality (iMD-VR) as a new data curation strategy that enables human users to rapidly sample geometries along reaction pathways. Focusing on hydrogen abstraction reactions of CN radical with isopentane, we compare the performance of NNs trained using iMD-VR data versus NNs trained using a more traditional method, namely, molecular dynamics (MD) constrained to sample a predefined grid of points along the hydrogen abstraction reaction coordinate. Both the NN trained using iMD-VR data and the NN trained using the constrained MD data reproduce important qualitative features of the reactive PESs, such as a low and early barrier to abstraction. Quantitative analysis shows that NN learning is sensitive to the data set used for training. Our results show that user-sampled structures obtained with the quantum chemical iMD-VR machinery enable excellent sampling in the vicinity of the minimum energy path (MEP). As a result, the NN trained on the iMD-VR data does very well predicting energies that are close to the MEP but less well predicting energies for "off-path" structures. The NN trained on the constrained MD data does better predicting high-energy off-path structures, given that it included a number of such structures in its training set.
虽然不久前许多计算工作流程的主要瓶颈还受限于处理能力,但机器学习技术的兴起带来了一个有趣的范式转变,即越来越重视与数据管理相关的问题,也就是数据大小、质量、偏差、格式和覆盖范围。与数据相关的问题日益与用于处理数据和从数据中学习的算法方法同等重要。在此,我们引入一个用于学习反应势能面(PES)的开源图形处理单元加速神经网络(NN)框架。为了获取此NN框架的训练数据,我们研究了在虚拟现实中使用实时交互式从头算分子动力学(iMD-VR)作为一种新的数据管理策略,该策略使人类用户能够沿着反应路径快速采样几何结构。以CN自由基与异戊烷的氢提取反应为重点,我们比较了使用iMD-VR数据训练的NN与使用更传统方法(即分子动力学(MD),其被约束沿着氢提取反应坐标对预定义的点网格进行采样)训练的NN的性能。使用iMD-VR数据训练的NN和使用约束MD数据训练的NN都重现了反应PES的重要定性特征,例如低且早期的提取势垒。定量分析表明,NN学习对用于训练的数据集敏感。我们的结果表明,通过量子化学iMD-VR机制获得的用户采样结构能够在最小能量路径(MEP)附近实现出色的采样。因此,基于iMD-VR数据训练的NN在预测接近MEP的能量方面表现很好,但在预测“偏离路径”结构的能量方面表现较差。基于约束MD数据训练的NN在预测高能偏离路径结构方面表现更好,因为其训练集中包含了许多此类结构。