• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

随机化的SMILES字符串提高了分子生成模型的质量。

Randomized SMILES strings improve the quality of molecular generative models.

作者信息

Arús-Pous Josep, Johansson Simon Viet, Prykhodko Oleksii, Bjerrum Esben Jannik, Tyrchan Christian, Reymond Jean-Louis, Chen Hongming, Engkvist Ola

机构信息

Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.

Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.

出版信息

J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.

DOI:10.1186/s13321-019-0393-0
PMID:33430971
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6873550/
Abstract

Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.

摘要

用一组表示为独特(规范)SMILES字符串的分子训练的循环神经网络(RNN),已显示出创建有效且有意义结构的大型化学空间的能力。在此,我们对使用不同大小(100万、1万和1000)的GDB - 13子集、不同SMILES变体(规范、随机化和DeepSMILES)、两种不同循环单元类型(LSTM和GRU)以及不同超参数组合训练的模型进行了广泛的基准测试。为指导基准测试,开发了新的指标来定义模型对训练集的泛化程度。根据生成化学空间的均匀性、封闭性和完整性对其进行评估。结果表明,使用100万个随机化SMILES(一种非唯一分子字符串表示)训练的LSTM单元模型,比其他方法能够泛化到更大的化学空间,并且能更准确地表示目标化学空间。具体而言,一个用随机化SMILES训练的模型能够以近似均匀的概率生成GDB - 13中的几乎所有分子。用较小样本训练的模型在使用随机化SMILES模型训练时显示出更大的改进。此外,对从ChEMBL获得的分子进行了模型训练,再次说明使用随机化SMILES训练会使模型对类药物化学空间有更好的表示。也就是说,与使用规范SMILES训练的模型相比,使用随机化SMILES训练的模型能够生成至少两倍数量的具有相同性质分布的独特分子。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/c1aec2dd229f/13321_2019_393_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/91c76f675468/13321_2019_393_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/3076e441216e/13321_2019_393_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/fd2c3ef3b3f9/13321_2019_393_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/9b6410ea94d8/13321_2019_393_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/1710f815a047/13321_2019_393_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/c1aec2dd229f/13321_2019_393_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/91c76f675468/13321_2019_393_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/3076e441216e/13321_2019_393_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/fd2c3ef3b3f9/13321_2019_393_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/9b6410ea94d8/13321_2019_393_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/1710f815a047/13321_2019_393_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8921/6873550/c1aec2dd229f/13321_2019_393_Fig6_HTML.jpg

相似文献

1
Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。
J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.
2
SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。
J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.
3
GEN: highly efficient SMILES explorer using autodidactic generative examination networks.GEN:使用自学习生成式检查网络的高效SMILES资源探索器。
J Cheminform. 2020 Apr 10;12(1):22. doi: 10.1186/s13321-020-00425-8.
4
Exploring the GDB-13 chemical space using deep generative models.使用深度生成模型探索GDB-13化学空间。
J Cheminform. 2019 Mar 12;11(1):20. doi: 10.1186/s13321-019-0341-z.
5
Improving Chemical Autoencoder Latent Space and Molecular Generation Diversity with Heteroencoders.用异构图编码器改进化学自动编码器潜在空间和分子生成多样性。
Biomolecules. 2018 Oct 30;8(4):131. doi: 10.3390/biom8040131.
6
Training recurrent neural networks as generative neural networks for molecular structures: how does it impact drug discovery?将循环神经网络训练为生成式神经网络用于分子结构:它如何影响药物发现?
Expert Opin Drug Discov. 2022 Oct;17(10):1071-1079. doi: 10.1080/17460441.2023.2134340. Epub 2022 Oct 17.
7
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI.迈向通用 SMILES 表示法——基于 InChI 生成规范 SMILES 的标准方法
J Cheminform. 2012 Sep 18;4(1):22. doi: 10.1186/1758-2946-4-22.
8
Bidirectional Molecule Generation with Recurrent Neural Networks.双向分子生成的递归神经网络。
J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.
9
De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子:从简化图到 SMILES 的转换。
J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.
10
Adversarial Threshold Neural Computer for Molecular de Novo Design.对抗式阈神经网络计算机在分子从头设计中的应用
Mol Pharm. 2018 Oct 1;15(10):4386-4397. doi: 10.1021/acs.molpharmaceut.7b01137. Epub 2018 Mar 30.

引用本文的文献

1
Going beyond SMILES enumeration for data augmentation in generative drug discovery.超越用于生成式药物发现中数据增强的SMILES枚举法。
Digit Discov. 2025 Aug 14. doi: 10.1039/d5dd00028a.
2
Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。
J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.
3
In-silico 3D molecular editing through physics-informed and preference-aligned generative foundation models.通过物理信息和偏好对齐生成基础模型进行的计算机模拟3D分子编辑。

本文引用的文献

1
A de novo molecular generation method using latent vector based generative adversarial network.一种使用基于潜在向量的生成对抗网络的从头分子生成方法。
J Cheminform. 2019 Dec 3;11(1):74. doi: 10.1186/s13321-019-0397-9.
2
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集(MOSES):分子生成模型的基准测试平台。
Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.
3
Drug Analogs from Fragment-Based Long Short-Term Memory Generative Neural Networks.基于片段的长短时记忆生成神经网络的药物类似物。
Nat Commun. 2025 Jul 1;16(1):6043. doi: 10.1038/s41467-025-61323-x.
4
MEN: leveraging explainable multimodal encoding network for precision prediction of CYP450 inhibitors.MEN:利用可解释的多模态编码网络进行CYP450抑制剂的精准预测
Sci Rep. 2025 Jul 1;15(1):21820. doi: 10.1038/s41598-025-04982-6.
5
Identification of nanomolar adenosine A receptor ligands using reinforcement learning and structure-based drug design.利用强化学习和基于结构的药物设计鉴定纳摩尔级别的腺苷 A 受体配体。
Nat Commun. 2025 Jul 1;16(1):5485. doi: 10.1038/s41467-025-60629-0.
6
Setting new benchmarks in AI-driven infrared structure elucidation.在人工智能驱动的红外结构解析方面设定新的基准。
Digit Discov. 2025 Jun 25. doi: 10.1039/d5dd00131e.
7
Conditioned Generative Modeling of Molecular Glues: A Realistic AI Approach for Synthesizable Drug-like Molecules.分子胶的条件生成建模:一种用于可合成类药物分子的现实人工智能方法。
Biomolecules. 2025 Jun 10;15(6):849. doi: 10.3390/biom15060849.
8
Multi-Modal Design, Synthesis, and Biological Evaluation of Novel Fusidic Acid Derivatives.新型夫西地酸衍生物的多模态设计、合成及生物学评价
Molecules. 2025 Apr 29;30(9):1983. doi: 10.3390/molecules30091983.
9
SimSon: simple contrastive learning of SMILES for molecular property prediction.SimSon:用于分子性质预测的基于SMILES的简单对比学习
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf275.
10
Sculpting molecules in text-3D space: a flexible substructure aware framework for text-oriented molecular optimization.在文本3D空间中塑造分子:一种面向文本的分子优化的灵活子结构感知框架。
BMC Bioinformatics. 2025 May 7;26(1):123. doi: 10.1186/s12859-025-06072-w.
J Chem Inf Model. 2019 Apr 22;59(4):1347-1356. doi: 10.1021/acs.jcim.8b00902. Epub 2019 Apr 8.
4
GuacaMol: Benchmarking Models for de Novo Molecular Design.GuacaMol:从头设计分子的模型基准测试。
J Chem Inf Model. 2019 Mar 25;59(3):1096-1108. doi: 10.1021/acs.jcim.8b00839. Epub 2019 Mar 19.
5
Exploring the GDB-13 chemical space using deep generative models.使用深度生成模型探索GDB-13化学空间。
J Cheminform. 2019 Mar 12;11(1):20. doi: 10.1186/s13321-019-0341-z.
6
Improving Chemical Autoencoder Latent Space and Molecular Generation Diversity with Heteroencoders.用异构图编码器改进化学自动编码器潜在空间和分子生成多样性。
Biomolecules. 2018 Oct 30;8(4):131. doi: 10.3390/biom8040131.
7
Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery.Fréchet ChemNet 距离:药物发现中分子生成模型的一种度量。
J Chem Inf Model. 2018 Sep 24;58(9):1736-1741. doi: 10.1021/acs.jcim.8b00234. Epub 2018 Aug 28.
8
Multi-objective de novo drug design with conditional graph generative model.基于条件图生成模型的多目标从头药物设计
J Cheminform. 2018 Jul 24;10(1):33. doi: 10.1186/s13321-018-0287-6.
9
Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks.使用递归神经网络生成用于药物发现的聚焦分子库。
ACS Cent Sci. 2018 Jan 24;4(1):120-131. doi: 10.1021/acscentsci.7b00512. Epub 2017 Dec 28.
10
The rise of deep learning in drug discovery.深度学习在药物发现中的崛起。
Drug Discov Today. 2018 Jun;23(6):1241-1250. doi: 10.1016/j.drudis.2018.01.039. Epub 2018 Jan 31.