Suppr超能文献

数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.

作者信息

Thakkar Amol, Kogej Thierry, Reymond Jean-Louis, Engkvist Ola, Bjerrum Esben Jannik

机构信息

Hit Discovery , Discovery Sciences, R&D , AstraZeneca , Gothenburg , Sweden . Email:

Department of Chemistry and Biochemistry , University of Bern , Bern , Switzerland . Email:

出版信息

Chem Sci. 2019 Nov 5;11(1):154-168. doi: 10.1039/c9sc04944d. eCollection 2020 Jan 7.

Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

摘要

计算机辅助合成规划(CASP)近来已引起了相当大的关注。在此,我们研究了一种基于模板的逆合成规划工具,该工具在包含多达1750万个反应的各种数据集上进行了训练。我们证明,在诸如内部电子实验室笔记本(ELN)以及公开可用的美国专利局(USPTO)提取物等数据集上训练的模型,足以预测药物化学中目标化合物的完整合成路线。因此,我们针对来自41个虚拟库的1731种化合物评估了这些模型,这些化合物的实验结果是已知的。此外,我们表明准确性是评估策略网络的一个误导性指标,并建议改为检查成功应用的模板数量以及生成完整合成路线的整体能力。为此,我们发现模板的特异性是以通用性和整体模型性能为代价的。对基础数据集及其相应模型的比较进一步补充了这一点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a6b/7012039/19c17c43beac/c9sc04944d-f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验