• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过(深度)SMILES片段的快速组装进行分子生成。

Molecular generation by Fast Assembly of (Deep)SMILES fragments.

作者信息

Berenger Francois, Tsuda Koji

机构信息

Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba, 277-8561, Japan.

出版信息

J Cheminform. 2021 Nov 14;13(1):88. doi: 10.1186/s13321-021-00566-4.

DOI:10.1186/s13321-021-00566-4
PMID:34775976
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8591910/
Abstract

BACKGROUND

In recent years, in silico molecular design is regaining interest. To generate on a computer molecules with optimized properties, scoring functions can be coupled with a molecular generator to design novel molecules with a desired property profile.

RESULTS

In this article, a simple method is described to generate only valid molecules at high frequency ([Formula: see text] molecule/s using a single CPU core), given a molecular training set. The proposed method generates diverse SMILES (or DeepSMILES) encoded molecules while also showing some propensity at training set distribution matching. When working with DeepSMILES, the method reaches peak performance ([Formula: see text] molecule/s) because it relies almost exclusively on string operations. The "Fast Assembly of SMILES Fragments" software is released as open-source at https://github.com/UnixJunkie/FASMIFRA . Experiments regarding speed, training set distribution matching, molecular diversity and benchmark against several other methods are also shown.

摘要

背景

近年来,计算机辅助分子设计正重新受到关注。为了在计算机上生成具有优化性质的分子,评分函数可与分子生成器相结合,以设计具有所需性质概况的新型分子。

结果

在本文中,描述了一种简单的方法,在给定分子训练集的情况下,能够以高频(使用单个CPU核心时为[公式:见正文]分子/秒)仅生成有效的分子。所提出的方法生成多样化的SMILES(或DeepSMILES)编码分子,同时在训练集分布匹配方面也表现出一定倾向。当使用DeepSMILES时,该方法达到峰值性能([公式:见正文]分子/秒),因为它几乎完全依赖于字符串操作。“SMILES片段快速组装”软件作为开源软件在https://github.com/UnixJunkie/FASMIFRA上发布。还展示了关于速度、训练集分布匹配、分子多样性以及与其他几种方法对比的基准测试的实验。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/efb8c8e6ed3d/13321_2021_566_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/226508df3165/13321_2021_566_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/5ad2de1687c9/13321_2021_566_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/1094f912f924/13321_2021_566_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/a4c113e47b9e/13321_2021_566_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/dab2910eb4d8/13321_2021_566_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/efb8c8e6ed3d/13321_2021_566_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/226508df3165/13321_2021_566_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/5ad2de1687c9/13321_2021_566_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/1094f912f924/13321_2021_566_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/a4c113e47b9e/13321_2021_566_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/dab2910eb4d8/13321_2021_566_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e5/8591910/efb8c8e6ed3d/13321_2021_566_Fig6_HTML.jpg

相似文献

1
Molecular generation by Fast Assembly of (Deep)SMILES fragments.通过(深度)SMILES片段的快速组装进行分子生成。
J Cheminform. 2021 Nov 14;13(1):88. doi: 10.1186/s13321-021-00566-4.
2
MERMAID: an open source automated hit-to-lead method based on deep reinforcement learning.MERMAID:一种基于深度强化学习的开源自动化从命中到先导物的方法。
J Cheminform. 2021 Nov 27;13(1):94. doi: 10.1186/s13321-021-00572-6.
3
FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR:使用变压器的多目标从头药物设计的端到端反馈策略。
Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.
4
De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子:从简化图到 SMILES 的转换。
J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.
5
Bidirectional Molecule Generation with Recurrent Neural Networks.双向分子生成的递归神经网络。
J Chem Inf Model. 2020 Mar 23;60(3):1175-1183. doi: 10.1021/acs.jcim.9b00943. Epub 2020 Jan 16.
6
Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。
J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.
7
Adversarial Threshold Neural Computer for Molecular de Novo Design.对抗式阈神经网络计算机在分子从头设计中的应用
Mol Pharm. 2018 Oct 1;15(10):4386-4397. doi: 10.1021/acs.molpharmaceut.7b01137. Epub 2018 Mar 30.
8
Diversity oriented Deep Reinforcement Learning for targeted molecule generation.用于靶向分子生成的面向多样性的深度强化学习
J Cheminform. 2021 Mar 9;13(1):21. doi: 10.1186/s13321-021-00498-z.
9
SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning.SMILES 对编码:一种用于深度学习的数据驱动子结构标记化算法。
J Chem Inf Model. 2021 Apr 26;61(4):1560-1569. doi: 10.1021/acs.jcim.0c01127. Epub 2021 Mar 14.
10
SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。
J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.

引用本文的文献

1
Crossover operators for molecular graphs with an application to virtual drug screening.用于分子图的交叉算子及其在虚拟药物筛选中的应用。
J Cheminform. 2025 Jun 17;17(1):97. doi: 10.1186/s13321-025-00958-w.
2
generation of dual-target compounds using artificial intelligence.利用人工智能生成双靶点化合物。
iScience. 2024 Dec 17;28(1):111526. doi: 10.1016/j.isci.2024.111526. eCollection 2025 Jan 17.
3
FOCUS on NOD2: Advancing IBD Drug Discovery with a User-Informed Machine Learning Framework.聚焦于NOD2:通过用户反馈的机器学习框架推进炎症性肠病药物研发

本文引用的文献

1
Beyond generative models: superfast traversal, optimization, novelty, exploration and discovery (STONED) algorithm for molecules using SELFIES.超越生成模型:使用SELFIES的分子超快速遍历、优化、新颖性、探索与发现(STONED)算法
Chem Sci. 2021 Apr 20;12(20):7079-7090. doi: 10.1039/d1sc00231g.
2
MolFinder: an evolutionary algorithm for the global optimization of molecular properties and the extensive exploration of chemical space using SMILES.MolFinder:一种用于分子性质全局优化和使用SMILES对化学空间进行广泛探索的进化算法。
J Cheminform. 2021 Mar 18;13(1):24. doi: 10.1186/s13321-021-00501-7.
3
EvoMol: a flexible and interpretable evolutionary algorithm for unbiased de novo molecular generation.
ACS Med Chem Lett. 2024 Jun 6;15(7):1057-1070. doi: 10.1021/acsmedchemlett.4c00148. eCollection 2024 Jul 11.
4
MolPLA: a molecular pretraining framework for learning cores, R-groups and their linker joints.MolPLA:用于学习核心、R 基团及其连接键的分子预训练框架。
Bioinformatics. 2024 Jun 28;40(Suppl 1):i369-i380. doi: 10.1093/bioinformatics/btae256.
5
t-SMILES: a fragment-based molecular representation framework for de novo ligand design.t-SMILES:一种用于从头设计配体的基于片段的分子表示框架。
Nat Commun. 2024 Jun 11;15(1):4993. doi: 10.1038/s41467-024-49388-6.
6
Molecular fragmentation as a crucial step in the AI-based drug development pathway.分子碎片化是基于人工智能的药物开发途径中的关键步骤。
Commun Chem. 2024 Feb 1;7(1):20. doi: 10.1038/s42004-024-01109-2.
7
Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties.在机器学习模型中利用基于香农熵的描述符来提高分子性质的预测准确性。
J Cheminform. 2023 May 21;15(1):54. doi: 10.1186/s13321-023-00712-0.
8
67 million natural product-like compound database generated via molecular language processing.通过分子语言处理生成的 6700 万种类似天然产物的化合物数据库。
Sci Data. 2023 May 19;10(1):296. doi: 10.1038/s41597-023-02207-x.
9
Strategies for structure elucidation of small molecules based on LC-MS/MS data from complex biological samples.基于复杂生物样品的液相色谱-串联质谱数据解析小分子结构的策略。
Comput Struct Biotechnol J. 2022 Sep 7;20:5085-5097. doi: 10.1016/j.csbj.2022.09.004. eCollection 2022.
10
Advancing Rare-Earth Separation by Machine Learning.通过机器学习推进稀土分离
JACS Au. 2022 Jun 15;2(6):1428-1434. doi: 10.1021/jacsau.2c00122. eCollection 2022 Jun 27.
EvoMol:一种用于无偏差从头分子生成的灵活且可解释的进化算法。
J Cheminform. 2020 Sep 16;12(1):55. doi: 10.1186/s13321-020-00458-z.
4
SMILES-based deep generative scaffold decorator for de-novo drug design.用于从头药物设计的基于SMILES的深度生成支架修饰器。
J Cheminform. 2020 May 29;12(1):38. doi: 10.1186/s13321-020-00441-8.
5
Randomized SMILES strings improve the quality of molecular generative models.随机化的SMILES字符串提高了分子生成模型的质量。
J Cheminform. 2019 Nov 21;11(1):71. doi: 10.1186/s13321-019-0393-0.
6
CReM: chemically reasonable mutations framework for structure generation.CReM:用于结构生成的化学合理突变框架
J Cheminform. 2020 Apr 22;12(1):28. doi: 10.1186/s13321-020-00431-w.
7
A de novo molecular generation method using latent vector based generative adversarial network.一种使用基于潜在向量的生成对抗网络的从头分子生成方法。
J Cheminform. 2019 Dec 3;11(1):74. doi: 10.1186/s13321-019-0397-9.
8
Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.分子集(MOSES):分子生成模型的基准测试平台。
Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.
9
Control of Synthetic Feasibility of Compounds Generated with CReM.用 CReM 生成的化合物的合成可行性控制。
J Chem Inf Model. 2020 Dec 28;60(12):6074-6080. doi: 10.1021/acs.jcim.0c00792. Epub 2020 Nov 9.
10
REINVENT 2.0: An AI Tool for De Novo Drug Design.REINVENT 2.0:一种用于从头设计药物的人工智能工具。
J Chem Inf Model. 2020 Dec 28;60(12):5918-5922. doi: 10.1021/acs.jcim.0c00915. Epub 2020 Oct 29.