Thakkar Amol, Chadimová Veronika, Bjerrum Esben Jannik, Engkvist Ola, Reymond Jean-Louis
Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg 431 50 Sweden
Department of Chemistry and Biochemistry, University of Bern Bern CH-3012 Switzerland
Chem Sci. 2021 Jan 22;12(9):3339-3349. doi: 10.1039/d0sc05401a.
Computer aided synthesis planning (CASP) is part of a suite of artificial intelligence (AI) based tools that are able to propose synthesis routes to a wide range of compounds. However, at present they are too slow to be used to screen the synthetic feasibility of millions of generated or enumerated compounds before identification of potential bioactivity by virtual screening (VS) workflows. Herein we report a machine learning (ML) based method capable of classifying whether a synthetic route can be identified for a particular compound or not by the CASP tool AiZynthFinder. The resulting ML models return a retrosynthetic accessibility score (RAscore) of any molecule of interest, and computes at least 4500 times faster than retrosynthetic analysis performed by the underlying CASP tool. The RAscore should be useful for pre-screening millions of virtual molecules from enumerated databases or generative models for synthetic accessibility and produce higher quality databases for virtual screening of biological activity.
计算机辅助合成规划(CASP)是一套基于人工智能(AI)的工具的一部分,这些工具能够为多种化合物提出合成路线。然而,目前它们速度太慢,无法在通过虚拟筛选(VS)工作流程识别潜在生物活性之前,用于筛选数百万个生成或枚举化合物的合成可行性。在此,我们报告一种基于机器学习(ML)的方法,该方法能够通过CASP工具AiZynthFinder对特定化合物是否可以确定合成路线进行分类。所得的ML模型返回任何感兴趣分子的逆合成可及性分数(RAscore),并且计算速度比底层CASP工具进行的逆合成分析快至少4500倍。RAscore对于从枚举数据库或生成模型中预筛选数百万个虚拟分子的合成可及性应该是有用的,并为生物活性的虚拟筛选生成更高质量的数据库。