Suppr超能文献

数据集及其对制药领域计算机辅助合成规划工具发展的影响。

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain.

作者信息

Thakkar Amol, Kogej Thierry, Reymond Jean-Louis, Engkvist Ola, Bjerrum Esben Jannik

机构信息

Hit Discovery , Discovery Sciences, R&D , AstraZeneca , Gothenburg , Sweden . Email:

Department of Chemistry and Biochemistry , University of Bern , Bern , Switzerland . Email:

出版信息

Chem Sci. 2019 Nov 5;11(1):154-168. doi: 10.1039/c9sc04944d. eCollection 2020 Jan 7.

Abstract

Computer Assisted Synthesis Planning (CASP) has gained considerable interest as of late. Herein we investigate a template-based retrosynthetic planning tool, trained on a variety of datasets consisting of up to 17.5 million reactions. We demonstrate that models trained on datasets such as internal Electronic Laboratory Notebooks (ELN), and the publicly available United States Patent Office (USPTO) extracts, are sufficient for the prediction of full synthetic routes to compounds of interest in medicinal chemistry. As such we have assessed the models on 1731 compounds from 41 virtual libraries for which experimental results were known. Furthermore, we show that accuracy is a misleading metric for assessment of the policy network, and propose that the number of successfully applied templates, in conjunction with the overall ability to generate full synthetic routes be examined instead. To this end we found that the specificity of the templates comes at the cost of generalizability, and overall model performance. This is supplemented by a comparison of the underlying datasets and their corresponding models.

摘要

计算机辅助合成规划(CASP)近来已引起了相当大的关注。在此,我们研究了一种基于模板的逆合成规划工具,该工具在包含多达1750万个反应的各种数据集上进行了训练。我们证明,在诸如内部电子实验室笔记本(ELN)以及公开可用的美国专利局(USPTO)提取物等数据集上训练的模型,足以预测药物化学中目标化合物的完整合成路线。因此,我们针对来自41个虚拟库的1731种化合物评估了这些模型,这些化合物的实验结果是已知的。此外,我们表明准确性是评估策略网络的一个误导性指标,并建议改为检查成功应用的模板数量以及生成完整合成路线的整体能力。为此,我们发现模板的特异性是以通用性和整体模型性能为代价的。对基础数据集及其相应模型的比较进一步补充了这一点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a6b/7012039/19c17c43beac/c9sc04944d-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验