Zhong Zipeng, Song Jie, Feng Zunlei, Liu Tiantao, Jia Lingxiang, Yao Shaolun, Wu Min, Hou Tingjun, Song Mingli
College of Computer Science and Technology, Zhejiang University Hangzhou 310027 P. R. China
School of Software Technology, Zhejiang University Ningbo 315048 P. R. China.
Chem Sci. 2022 Jul 12;13(31):9023-9034. doi: 10.1039/d2sc02763a. eCollection 2022 Aug 10.
Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
化学反应预测,包括正向合成预测和逆向合成预测,是有机合成中的一个基本问题。一种流行的计算范式将合成预测表述为一个序列到序列的翻译问题,其中分子表示采用典型的SMILES。然而,通用的SMILES忽略了化学反应的特征,即从反应物到产物分子图拓扑结构基本不变,直接应用时会导致SMILES性能次优。在本文中,我们提出了根对齐的SMILES(R-SMILES),它为产物和反应物SMILES指定了紧密对齐的一对一映射,以实现更高效的合成预测。由于严格的一对一映射和减小的编辑距离,计算模型在很大程度上无需学习复杂的语法,而专注于学习反应的化学知识。我们将提出的R-SMILES与各种最先进的基线进行比较,结果表明它显著优于所有基线,证明了所提方法的优越性。