Jackson Riley, Zhang Wenyuan, Pearson Jason
Department of Chemistry, University of Prince Edward Island Canada
Chem Sci. 2021 Jun 23;12(29):10022-10040. doi: 10.1039/d1sc01206a. eCollection 2021 Jul 28.
Transition states are among the most important molecular structures in chemistry, critical to a variety of fields such as reaction kinetics, catalyst design, and the study of protein function. However, transition states are very unstable, typically only existing on the order of femtoseconds. The transient nature of these structures makes them incredibly difficult to study, thus chemists often turn to simulation. Unfortunately, computer simulation of transition states is also challenging, as they are first-order saddle points on highly dimensional mathematical surfaces. Locating these points is resource intensive and unreliable, resulting in methods which can take very long to converge. Machine learning, a relatively novel class of algorithm, has led to radical changes in several fields of computation, including computer vision and natural language processing due to its aptitude for highly accurate function approximation. While machine learning has been widely adopted throughout computational chemistry as a lightweight alternative to costly quantum mechanical calculations, little research has been pursued which utilizes machine learning for transition state structure optimization. In this paper TSNet is presented, a new end-to-end Siamese message-passing neural network based on tensor field networks shown to be capable of predicting transition state geometries. Also presented is a small dataset of S2 reactions which includes transition state structures - the first of its kind built specifically for machine learning. Finally, transfer learning, a low data remedial technique, is explored to understand the viability of pretraining TSNet on widely available chemical data may provide better starting points during training, faster convergence, and lower loss values. Aspects of the new dataset and model shall be discussed in detail, along with motivations and general outlook on the future of machine learning-based transition state prediction.
过渡态是化学中最重要的分子结构之一,对反应动力学、催化剂设计和蛋白质功能研究等多个领域至关重要。然而,过渡态非常不稳定,通常仅在飞秒量级存在。这些结构的瞬态性质使其极难研究,因此化学家常常求助于模拟。不幸的是,过渡态的计算机模拟也具有挑战性,因为它们是高维数学曲面上的一阶鞍点。定位这些点资源消耗大且不可靠,导致方法收敛可能需要很长时间。机器学习是一类相对新颖的算法,由于其在高精度函数逼近方面的能力,已经在包括计算机视觉和自然语言处理在内的几个计算领域引发了根本性变革。虽然机器学习作为昂贵量子力学计算的轻量级替代方案已在整个计算化学中广泛采用,但利用机器学习进行过渡态结构优化的研究却很少。本文介绍了TSNet,一种基于张量场网络的新型端到端连体消息传递神经网络,它能够预测过渡态几何结构。还展示了一个S2反应的小型数据集,其中包括过渡态结构——这是首个专门为机器学习构建的此类数据集。最后,探索了迁移学习这种低数据补救技术,以了解在广泛可用的化学数据上预训练TSNet的可行性,这可能在训练期间提供更好的起点以及更快的收敛速度和更低的损失值。将详细讨论新数据集和模型的各个方面,以及基于机器学习的过渡态预测的动机和未来总体展望。