用于从头药物设计的基于SMILES的深度生成支架修饰器。

SMILES-based deep generative scaffold decorator for de-novo drug design.

作者信息

Arús-Pous Josep, Patronov Atanas, Bjerrum Esben Jannik, Tyrchan Christian, Reymond Jean-Louis, Chen Hongming, Engkvist Ola

机构信息

Molecular AI, Hit Discovery, Discovery Sciences, BioPharmaceutical R&D, AstraZeneca, Gothenburg, Sweden.

Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.

出版信息

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

DOI:10.1186/s13321-020-00441-8

PMID:33431013

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7260788/

Abstract

Molecular generative models trained with small sets of molecules represented as SMILES strings can generate large regions of the chemical space. Unfortunately, due to the sequential nature of SMILES strings, these models are not able to generate molecules given a scaffold (i.e., partially-built molecules with explicit attachment points). Herein we report a new SMILES-based molecular generative architecture that generates molecules from scaffolds and can be trained from any arbitrary molecular set. This approach is possible thanks to a new molecular set pre-processing algorithm that exhaustively slices all possible combinations of acyclic bonds of every molecule, combinatorically obtaining a large number of scaffolds with their respective decorations. Moreover, it serves as a data augmentation technique and can be readily coupled with randomized SMILES to obtain even better results with small sets. Two examples showcasing the potential of the architecture in medicinal and synthetic chemistry are described: First, models were trained with a training set obtained from a small set of Dopamine Receptor D2 (DRD2) active modulators and were able to meaningfully decorate a wide range of scaffolds and obtain molecular series predicted active on DRD2. Second, a larger set of drug-like molecules from ChEMBL was selectively sliced using synthetic chemistry constraints (RECAP rules). In this case, the resulting scaffolds with decorations were filtered only to allow those that included fragment-like decorations. This filtering process allowed models trained with this dataset to selectively decorate diverse scaffolds with fragments that were generally predicted to be synthesizable and attachable to the scaffold using known synthetic approaches. In both cases, the models were already able to decorate molecules using specific knowledge without the need to add it with other techniques, such as reinforcement learning. We envision that this architecture will become a useful addition to the already existent architectures for de novo molecular generation.

摘要

使用以SMILES字符串表示的小分子集训练的分子生成模型可以生成化学空间的大片区域。不幸的是，由于SMILES字符串的顺序性质，这些模型无法根据支架（即具有明确连接点的部分构建分子）生成分子。在此，我们报告了一种基于SMILES的新分子生成架构，该架构可从支架生成分子，并且可以从任何任意分子集进行训练。由于一种新的分子集预处理算法，这种方法成为可能，该算法详尽地切割每个分子的无环键的所有可能组合，通过组合获得大量带有各自修饰的支架。此外，它还用作数据增强技术，并且可以很容易地与随机SMILES相结合，以便在小分子集上获得更好的结果。描述了两个展示该架构在药物化学和合成化学中潜力的例子：首先，使用从一小组多巴胺受体D2（DRD2）活性调节剂获得的训练集训练模型，这些模型能够有意义地修饰各种支架，并获得预测对DRD2有活性的分子系列。其次，使用合成化学约束（RECAP规则）对来自ChEMBL的更大一组类药物分子进行选择性切割。在这种情况下，仅对带有修饰的所得支架进行过滤，以允许那些包含片段样修饰的支架。这种过滤过程使得使用该数据集训练的模型能够用通常预测可合成且可使用已知合成方法连接到支架上的片段选择性地修饰不同的支架。在这两种情况下，模型已经能够使用特定知识修饰分子，而无需使用其他技术（如强化学习）添加该知识。我们设想，这种架构将成为已有的从头分子生成架构的有用补充。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9186/7260788/c409246dcf06/13321_2020_441_Fig1_HTML.jpg

相似文献

SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。

J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。

J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.

De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子：从简化图到 SMILES 的转换。

J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.

GEN: highly efficient SMILES explorer using autodidactic generative examination networks.GEN：使用自学习生成式检查网络的高效SMILES资源探索器。

J Cheminform. 2020 Apr 10;12(1):22. doi: 10.1186/s13321-020-00425-8.

Scaffold-Constrained Molecular Generation.支架约束分子生成。

J Chem Inf Model. 2020 Dec 28;60(12):5637-5646. doi: 10.1021/acs.jcim.0c01015. Epub 2020 Dec 10.

Generative Adversarial Networks for De Novo Molecular Design.生成对抗网络用于从头分子设计。

Mol Inform. 2021 Oct;40(10):e2100045. doi: 10.1002/minf.202100045. Epub 2021 Jul 6.

Adversarial Threshold Neural Computer for Molecular de Novo Design.对抗式阈神经网络计算机在分子从头设计中的应用

Mol Pharm. 2018 Oct 1;15(10):4386-4397. doi: 10.1021/acs.molpharmaceut.7b01137. Epub 2018 Mar 30.

Training recurrent neural networks as generative neural networks for molecular structures: how does it impact drug discovery?将循环神经网络训练为生成式神经网络用于分子结构：它如何影响药物发现？

Expert Opin Drug Discov. 2022 Oct;17(10):1071-1079. doi: 10.1080/17460441.2023.2134340. Epub 2022 Oct 17.

Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES.使用增强型 SMILES 进行双环强化学习，实现更快、更多样的从头分子优化。

J Comput Aided Mol Des. 2023 Aug;37(8):373-394. doi: 10.1007/s10822-023-00512-6. Epub 2023 Jun 17.

Bidirectional Molecule Generation with Recurrent Neural Networks.双向分子生成的递归神经网络。

J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.

引用本文的文献

Design and synthesis of novel indolinone Aurora B kinase inhibitors based on fragment-based drug discovery (FBDD).基于片段药物发现（FBDD）的新型吲哚啉酮Aurora B激酶抑制剂的设计与合成。

Mol Divers. 2025 Sep 10. doi: 10.1007/s11030-025-11353-w.

FusionCLM: enhanced molecular property prediction via knowledge fusion of chemical language models.FusionCLM：通过化学语言模型的知识融合增强分子性质预测

J Cheminform. 2025 Aug 29;17(1):133. doi: 10.1186/s13321-025-01073-6.

Design and optimization of novel succinate dehydrogenase inhibitors against agricultural fungi based on transformer model.基于Transformer模型的新型抗农业真菌琥珀酸脱氢酶抑制剂的设计与优化

Mol Divers. 2025 Aug 19. doi: 10.1007/s11030-025-11323-2.

Bridging chemical space and biological efficacy: advances and challenges in applying generative models in structural modification of natural products.连接化学空间与生物活性：生成模型在天然产物结构修饰中的应用进展与挑战

Nat Prod Bioprospect. 2025 Jun 6;15(1):37. doi: 10.1007/s13659-025-00521-y.

Multi-Modal Design, Synthesis, and Biological Evaluation of Novel Fusidic Acid Derivatives.新型夫西地酸衍生物的多模态设计、合成及生物学评价

Molecules. 2025 Apr 29;30(9):1983. doi: 10.3390/molecules30091983.

SimSon: simple contrastive learning of SMILES for molecular property prediction.SimSon：用于分子性质预测的基于SMILES的简单对比学习

Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf275.

Enhancing Unconditional Molecule Generation via Online Knowledge Distillation of Scaffolds.通过支架的在线知识蒸馏增强无条件分子生成

Molecules. 2025 Mar 12;30(6):1262. doi: 10.3390/molecules30061262.

Enhancing Activation Energy Predictions under Data Constraints Using Graph Neural Networks.使用图神经网络在数据约束下增强活化能预测

J Chem Inf Model. 2025 Feb 10;65(3):1367-1377. doi: 10.1021/acs.jcim.4c02319. Epub 2025 Jan 25.

STNGS: a deep scaffold learning-driven generation and screening framework for discovering potential novel psychoactive substances.STNGS：一种用于发现潜在新型精神活性物质的深度支架学习驱动的生成与筛选框架。

Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae690.

design of mIDH1 inhibitors by integrating deep learning and molecular modeling.通过整合深度学习和分子建模设计突变型异柠檬酸脱氢酶1（mIDH1）抑制剂

Front Pharmacol. 2024 Oct 23;15:1491699. doi: 10.3389/fphar.2024.1491699. eCollection 2024.

本文引用的文献

Scaffold-based molecular design with a graph generative model.基于支架的分子设计与图形生成模型。

Chem Sci. 2019 Dec 3;11(4):1153-1164. doi: 10.1039/c9sc04503a.

Visualization of very large high-dimensional data sets as minimum spanning trees.将超大型高维数据集可视化为最小生成树。

J Cheminform. 2020 Feb 12;12(1):12. doi: 10.1186/s13321-020-0416-x.

Craig plot 2.0: an interactive navigation in the substituent bioisosteric space.克雷格图2.0：取代基生物电子等排体空间中的交互式导航

J Cheminform. 2020 Jan 28;12(1):8. doi: 10.1186/s13321-020-0412-1.

Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。

J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.

A de novo molecular generation method using latent vector based generative adversarial network.一种使用基于潜在向量的生成对抗网络的从头分子生成方法。

J Cheminform. 2019 Dec 3;11(1):74. doi: 10.1186/s13321-019-0397-9.

Bidirectional Molecule Generation with Recurrent Neural Networks.双向分子生成的递归神经网络。

J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.

DeepScaffold: A Comprehensive Tool for Scaffold-Based De Novo Drug Discovery Using Deep Learning.DeepScaffold：一种基于深度学习的全面的支架药物从头发现工具。

J Chem Inf Model. 2020 Jan 27;60(1):77-91. doi: 10.1021/acs.jcim.9b00727. Epub 2019 Dec 20.

Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research.深度学习在工业制药研究中利用大规模异构化合物数据方面的应用。

Front Pharmacol. 2019 Nov 5;10:1303. doi: 10.3389/fphar.2019.01303. eCollection 2019.

Drug Analogs from Fragment-Based Long Short-Term Memory Generative Neural Networks.基于片段的长短时记忆生成神经网络的药物类似物。

J Chem Inf Model. 2019 Apr 22;59(4):1347-1356. doi: 10.1021/acs.jcim.8b00902. Epub 2019 Apr 8.

Exploring the GDB-13 chemical space using deep generative models.使用深度生成模型探索GDB-13化学空间。

J Cheminform. 2019 Mar 12;11(1):20. doi: 10.1186/s13321-019-0341-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

用于从头药物设计的基于SMILES的深度生成支架修饰器。

SMILES-based deep generative scaffold decorator for de-novo drug design.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献