Coley Connor W, Barzilay Regina, Jaakkola Tommi S, Green William H, Jensen Klavs F
Department of Chemical Engineering and Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States.
ACS Cent Sci. 2017 May 24;3(5):434-443. doi: 10.1021/acscentsci.7b00064. Epub 2017 Apr 18.
Computer assistance in synthesis design has existed for over 40 years, yet retrosynthesis planning software has struggled to achieve widespread adoption. One critical challenge in developing high-quality pathway suggestions is that proposed reaction steps often fail when attempted in the laboratory, despite initially seeming viable. The true measure of success for any synthesis program is whether the predicted outcome matches what is observed experimentally. We report a model framework for anticipating reaction outcomes that combines the traditional use of reaction templates with the flexibility in pattern recognition afforded by neural networks. Using 15 000 experimental reaction records from granted United States patents, a model is trained to select the major (recorded) product by ranking a self-generated list of candidates where one candidate is known to be the major product. Candidate reactions are represented using a unique edit-based representation that emphasizes the fundamental transformation from reactants to products, rather than the constituent molecules' overall structures. In a 5-fold cross-validation, the trained model assigns the major product rank 1 in 71.8% of cases, rank ≤3 in 86.7% of cases, and rank ≤5 in 90.8% of cases.
计算机辅助合成设计已经存在了40多年,但逆合成规划软件一直难以得到广泛应用。开发高质量反应路径建议的一个关键挑战是,尽管最初看似可行,但所提出的反应步骤在实验室中尝试时往往会失败。任何合成程序成功的真正衡量标准是预测结果是否与实验观察结果相符。我们报告了一个预测反应结果的模型框架,该框架将反应模板的传统用法与神经网络在模式识别方面的灵活性相结合。使用来自美国授权专利的15000条实验反应记录,训练一个模型,通过对一个自行生成的候选列表进行排序来选择主要(记录的)产物,其中一个候选产物已知是主要产物。候选反应使用一种独特的基于编辑的表示法来表示,这种表示法强调从反应物到产物的基本转化,而不是组成分子的整体结构。在五折交叉验证中,训练后的模型在71.8%的情况下将主要产物排在第1位,在86.7%的情况下排在≤第3位,在90.8%的情况下排在≤第5位。