Suppr超能文献

利用多尺度反应分类增强深度学习的逆合成反应预测

Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification.

机构信息

Computational Statistics and Bioinformatics Group, Advanced Artificial Intelligence Research Laboratory , WuXi NextCODE Cambridge , Massachusetts 02142 , United States.

Complex Biological Systems Alliance , Medford , Massachusetts 02155 , United States.

出版信息

J Chem Inf Model. 2019 Feb 25;59(2):673-688. doi: 10.1021/acs.jcim.8b00801. Epub 2019 Feb 1.

Abstract

Chemical synthesis planning is a key aspect in many fields of chemistry, especially drug discovery. Recent implementations of machine learning and artificial intelligence techniques for retrosynthetic analysis have shown great potential to improve computational methods for synthesis planning. Herein, we present a multiscale, data-driven approach for retrosynthetic analysis with deep highway networks (DHN). We automatically extracted reaction rules (i.e., ways in which a molecule is produced) from a data set consisting of chemical reactions derived from U.S. patents. We performed the retrosynthetic reaction prediction task in two steps: first, we built a DHN model to predict which group of reactions (consisting of chemically similar reaction rules) was employed to produce a molecule. Once a reaction group was identified, a DHN trained on the subset of reactions within the identified reaction group, was employed to predict the transformation rule used to produce a molecule. To validate our approach, we predicted the first retrosynthetic reaction step for 40 approved drugs using our multiscale model and compared its predictive performance with a conventional model trained on all machine-extracted reaction rules employed as a control. Our multiscale approach showed a success rate of 82.9% at generating valid reactants from retrosynthetic reaction predictions. Comparatively, the control model trained on all machine-extracted reaction rules yielded a success rate of 58.5% on the validation set of 40 pharmaceutical molecules, indicating a significant statistical improvement with our approach to match known first synthetic reaction of the tested drugs in this study. While our multiscale approach was unable to outperform state-of-the-art rule-based systems curated by expert chemists, multiscale classification represents a marked enhancement in retrosynthetic analysis and can be easily adapted for use in a range of artificial intelligence strategies.

摘要

化学合成规划是化学领域的一个关键方面,特别是在药物发现领域。最近,机器学习和人工智能技术在逆合成分析中的应用显示出了极大的潜力,可以改进合成规划的计算方法。在此,我们提出了一种基于深度学习高速公路网络(DHN)的多尺度、数据驱动的逆合成分析方法。我们从一个由美国专利中衍生出的化学反应组成的数据集自动提取了反应规则(即生成分子的方法)。我们在两个步骤中执行逆合成反应预测任务:首先,我们构建了一个 DHN 模型来预测使用了哪些反应组(由具有化学相似性的反应规则组成)来生成分子。一旦确定了一个反应组,就会使用在该反应组中训练的 DHN 来预测生成分子的转换规则。为了验证我们的方法,我们使用多尺度模型预测了 40 种已批准药物的第一个逆合成反应步骤,并将其预测性能与基于作为对照的所有机器提取的反应规则训练的传统模型进行了比较。我们的多尺度方法在从逆合成反应预测生成有效反应物方面的成功率为 82.9%。相比之下,基于所有机器提取的反应规则训练的对照模型在 40 种药物分子的验证集上的成功率为 58.5%,表明我们的方法在匹配已知的测试药物的第一个合成反应方面有显著的统计改进。虽然我们的多尺度方法无法超越由专家化学家精心策划的最先进的基于规则的系统,但多尺度分类在逆合成分析中是一个显著的改进,可以很容易地适应各种人工智能策略的使用。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验