Suppr超能文献

用于从头药物设计的基于SMILES的深度生成支架修饰器。

SMILES-based deep generative scaffold decorator for de-novo drug design.

作者信息

Arús-Pous Josep, Patronov Atanas, Bjerrum Esben Jannik, Tyrchan Christian, Reymond Jean-Louis, Chen Hongming, Engkvist Ola

机构信息

Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden.

Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.

出版信息

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

Abstract

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

摘要

使用以SMILES字符串表示的小分子集训练的分子生成模型可以生成化学空间的大片区域。不幸的是,由于SMILES字符串的顺序性质,这些模型无法根据支架(即具有明确连接点的部分构建分子)生成分子。在此,我们报告了一种基于SMILES的新分子生成架构,该架构可从支架生成分子,并且可以从任何任意分子集进行训练。由于一种新的分子集预处理算法,这种方法成为可能,该算法详尽地切割每个分子的无环键的所有可能组合,通过组合获得大量带有各自修饰的支架。此外,它还用作数据增强技术,并且可以很容易地与随机SMILES相结合,以便在小分子集上获得更好的结果。描述了两个展示该架构在药物化学和合成化学中潜力的例子:首先,使用从一小组多巴胺受体D2(DRD2)活性调节剂获得的训练集训练模型,这些模型能够有意义地修饰各种支架,并获得预测对DRD2有活性的分子系列。其次,使用合成化学约束(RECAP规则)对来自ChEMBL的更大一组类药物分子进行选择性切割。在这种情况下,仅对带有修饰的所得支架进行过滤,以允许那些包含片段样修饰的支架。这种过滤过程使得使用该数据集训练的模型能够用通常预测可合成且可使用已知合成方法连接到支架上的片段选择性地修饰不同的支架。在这两种情况下,模型已经能够使用特定知识修饰分子,而无需使用其他技术(如强化学习)添加该知识。我们设想,这种架构将成为已有的从头分子生成架构的有用补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9186/7260788/c409246dcf06/13321_2020_441_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验