数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.

作者信息

Thakkar Amol, Kogej Thierry, Reymond Jean-Louis, Engkvist Ola, Bjerrum Esben Jannik

机构信息

Hit Discovery , Discovery Sciences, R&D , AstraZeneca , Gothenburg , Sweden . Email:

Department of Chemistry and Biochemistry , University of Bern , Bern , Switzerland . Email:

出版信息

Chem Sci. 2019 Nov 5;11(1):154-168. doi: 10.1039/c9sc04944d. eCollection 2020 Jan 7.

DOI:10.1039/c9sc04944d

PMID:32110367

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7012039/

Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

摘要

计算机辅助合成规划（CASP）近来已引起了相当大的关注。在此，我们研究了一种基于模板的逆合成规划工具，该工具在包含多达1750万个反应的各种数据集上进行了训练。我们证明，在诸如内部电子实验室笔记本（ELN）以及公开可用的美国专利局（USPTO）提取物等数据集上训练的模型，足以预测药物化学中目标化合物的完整合成路线。因此，我们针对来自41个虚拟库的1731种化合物评估了这些模型，这些化合物的实验结果是已知的。此外，我们表明准确性是评估策略网络的一个误导性指标，并建议改为检查成功应用的模板数量以及生成完整合成路线的整体能力。为此，我们发现模板的特异性是以通用性和整体模型性能为代价的。对基础数据集及其相应模型的比较进一步补充了这一点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a6b/7012039/19c17c43beac/c9sc04944d-f1.jpg

相似文献

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Chem Sci. 2019 Nov 5;11(1):154-168. doi: 10.1039/c9sc04944d. eCollection 2020 Jan 7.

Machine Learning in Computer-Aided Synthesis Planning.计算机辅助合成规划中的机器学习

Acc Chem Res. 2018 May 15;51(5):1281-1289. doi: 10.1021/acs.accounts.8b00087. Epub 2018 May 1.

Transfer Learning: Making Retrosynthetic Predictions Based on a Small Chemical Reaction Dataset Scale to a New Level.迁移学习：基于小规模化学反应数据集的逆向合成预测扩展到新的水平。

Molecules. 2020 May 19;25(10):2357. doi: 10.3390/molecules25102357.

Finding Relevant Retrosynthetic Disconnections for Stereocontrolled Reactions.寻找立体控制反应的相关逆合成切断

J Chem Inf Model. 2024 Aug 12;64(15):5796-5805. doi: 10.1021/acs.jcim.4c00370. Epub 2024 Jul 12.

Automatic retrosynthetic route planning using template-free models.使用无模板模型的自动逆合成路线规划。

Chem Sci. 2020 Mar 3;11(12):3355-3364. doi: 10.1039/c9sc03666k.

Retrosynthetic accessibility score (RAscore) - rapid machine learned synthesizability classification from AI driven retrosynthetic planning.逆合成可及性分数（RAscore）——基于人工智能驱动的逆合成规划的快速机器学习合成性分类。

Chem Sci. 2021 Jan 22;12(9):3339-3349. doi: 10.1039/d0sc05401a.

Do Chemformers Dream of Organic Matter? Evaluating a Transformer Model for Multistep Retrosynthesis.化学合成器是否会梦到有机物质？评估用于多步逆合成的变压器模型。

J Chem Inf Model. 2024 Apr 22;64(8):3021-3033. doi: 10.1021/acs.jcim.3c01685. Epub 2024 Apr 11.

SynRoute: A Retrosynthetic Planning Software.SynRoute：一款回溯合成规划软件。

J Chem Inf Model. 2023 Sep 11;63(17):5484-5495. doi: 10.1021/acs.jcim.3c00491. Epub 2023 Aug 27.

On the use of real-world datasets for reaction yield prediction.关于使用真实世界数据集进行反应产率预测

Chem Sci. 2023 Mar 13;14(19):4997-5005. doi: 10.1039/d2sc06041h. eCollection 2023 May 17.

Improving Few- and Zero-Shot Reaction Template Prediction Using Modern Hopfield Networks.利用现代 Hopfield 网络改进少样本和零样本反应模板预测。

J Chem Inf Model. 2022 May 9;62(9):2111-2120. doi: 10.1021/acs.jcim.1c01065. Epub 2022 Jan 15.

引用本文的文献

Retrosynthetic crosstalk between single-step reaction and multi-step planning.单步反应与多步规划之间的逆合成串扰

J Cheminform. 2025 Aug 28;17(1):130. doi: 10.1186/s13321-025-01088-z.

Tango*: constrained synthesis planning using chemically informed value functions.Tango*：使用化学信息价值函数的受限合成规划。

Digit Discov. 2025 Aug 11. doi: 10.1039/d5dd00130g.

Human-guided synthesis planning prompting.人工引导的合成规划提示。

Chem Sci. 2025 Jul 14. doi: 10.1039/d5sc00927h.

A genotype-to-drug diffusion model for generation of tailored anti-cancer small molecules.一种用于生成定制抗癌小分子的基因型到药物扩散模型。

Nat Commun. 2025 Jul 1;16(1):5628. doi: 10.1038/s41467-025-60763-9.

Identification of nanomolar adenosine A receptor ligands using reinforcement learning and structure-based drug design.利用强化学习和基于结构的药物设计鉴定纳摩尔级别的腺苷 A 受体配体。

Nat Commun. 2025 Jul 1;16(1):5485. doi: 10.1038/s41467-025-60629-0.

Intermediate knowledge enhanced the performance of the amide coupling yield prediction model.中级知识提升了酰胺偶联产率预测模型的性能。

Chem Sci. 2025 Jun 5. doi: 10.1039/d5sc03364k.

Generating diversity and securing completeness in algorithmic retrosynthesis.在算法逆合成中生成多样性并确保完整性。

J Cheminform. 2025 May 13;17(1):72. doi: 10.1186/s13321-025-00981-x.

Machine learning: Python tools for studying biomolecules and drug design.机器学习：用于研究生物分子和药物设计的Python工具。

Mol Divers. 2025 Apr 29. doi: 10.1007/s11030-025-11199-2.

Exploring Simple Drug Scaffolds from the Generated Database Chemical Space Reveals a Chiral Bicyclic Azepane with Potent Neuropharmacology.从生成的数据库化学空间中探索简单药物骨架，发现一种具有强大神经药理学活性的手性双环氮杂环庚烷。

J Med Chem. 2025 May 8;68(9):9176-9201. doi: 10.1021/acs.jmedchem.4c02549. Epub 2025 Apr 24.

A Multi-Objective Molecular Generation Method Based on Pareto Algorithm and Monte Carlo Tree Search.一种基于帕累托算法和蒙特卡罗树搜索的多目标分子生成方法。

Adv Sci (Weinh). 2025 Apr 4:e2410640. doi: 10.1002/advs.202410640.

本文引用的文献

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.分子变压器：一种用于不确定性校准化学反应预测的模型。

ACS Cent Sci. 2019 Sep 25;5(9):1572-1583. doi: 10.1021/acscentsci.9b00576. Epub 2019 Aug 30.

A robotic platform for flow synthesis of organic compounds informed by AI planning.基于人工智能规划的有机化合物流动合成机器人平台。

Science. 2019 Aug 9;365(6453). doi: 10.1126/science.aax1566.

Learning Retrosynthetic Planning through Simulated Experience.通过模拟经验学习逆合成规划。

ACS Cent Sci. 2019 Jun 26;5(6):970-981. doi: 10.1021/acscentsci.9b00055. Epub 2019 May 31.

RDChiral: An RDKit Wrapper for Handling Stereochemistry in Retrosynthetic Template Extraction and Application.RDChiral：一个用于在逆合成模板提取和应用中处理立体化学的 RDKit 包装器。

J Chem Inf Model. 2019 Jun 24;59(6):2529-2537. doi: 10.1021/acs.jcim.9b00286. Epub 2019 Jun 13.

Synthetic Approaches to the New Drugs Approved During 2017.2017 年获批新药的合成方法。

J Med Chem. 2019 Aug 22;62(16):7340-7382. doi: 10.1021/acs.jmedchem.9b00196. Epub 2019 Apr 15.

Enhancing Retrosynthetic Reaction Prediction with Deep Learning Using Multiscale Reaction Classification.利用多尺度反应分类增强深度学习的逆合成反应预测

J Chem Inf Model. 2019 Feb 25;59(2):673-688. doi: 10.1021/acs.jcim.8b00801. Epub 2019 Feb 1.

Using Machine Learning To Predict Suitable Conditions for Organic Reactions.使用机器学习预测有机反应的合适条件。

ACS Cent Sci. 2018 Nov 28;4(11):1465-1476. doi: 10.1021/acscentsci.8b00357. Epub 2018 Nov 16.

The convergence of artificial intelligence and chemistry for improved drug discovery.人工智能与化学相结合以改进药物发现。

Future Med Chem. 2018 Nov;10(22):2573-2576. doi: 10.4155/fmc-2018-0161. Epub 2018 Nov 30.

International chemical identifier for reactions (RInChI).反应的国际化学标识符（RInChI）。

J Cheminform. 2018 May 9;10(1):22. doi: 10.1186/s13321-018-0277-8.

Machine Learning in Computer-Aided Synthesis Planning.计算机辅助合成规划中的机器学习

Acc Chem Res. 2018 May 15;51(5):1281-1289. doi: 10.1021/acs.accounts.8b00087. Epub 2018 May 1.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献