• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用大语言模型和无序化学语言解锁所有场景下的全面分子设计。

Unlocking comprehensive molecular design across all scenarios with large language model and unordered chemical language.

作者信息

Yue Jie, Peng Bingxin, Chen Yu, Jin Jieyu, Zhao Xinda, Shen Chao, Ji Xiangyang, Hsieh Chang-Yu, Song Jianfei, Hou Tingjun, Deng Yafeng, Wang Jike

机构信息

College of Information Engineering, Hebei University of Architecture Zhangjiakou 075132 Hebei China.

CarbonSilicon AI Technology Co., Ltd Hangzhou 310018 Zhejiang China

出版信息

Chem Sci. 2024 Jul 29;15(34):13727-13740. doi: 10.1039/d4sc03744h. eCollection 2024 Aug 28.

DOI:10.1039/d4sc03744h
PMID:39211505
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11352393/
Abstract

Molecular generation stands at the forefront of AI-driven technologies, playing a crucial role in accelerating the development of small molecule drugs. The intricate nature of practical drug discovery necessitates the development of a versatile molecular generation framework that can tackle diverse drug design challenges. However, existing methodologies often struggle to encompass all aspects of small molecule drug design, particularly those rooted in language models, especially in tasks like linker design, due to the autoregressive nature of large language model-based approaches. To empower a language model for a wider range of molecular design tasks, we introduce an unordered simplified molecular-input line-entry system based on fragments (FU-SMILES). Building upon this foundation, we propose FragGPT, a universal fragment-based molecular generation model. Initially pretrained on extensive molecular datasets, FragGPT utilizes FU-SMILES to facilitate efficient generation across various practical applications, such as molecule design, linker design, R-group exploration, scaffold hopping, and side chain optimization. Furthermore, we integrate conditional generation and reinforcement learning (RL) methodologies to ensure that the generated molecules possess multiple desired biological and physicochemical properties. Experimental results across diverse scenarios validate FragGPT's superiority in generating molecules with enhanced properties and novel structures, outperforming existing state-of-the-art models. Moreover, its robust drug design capability is further corroborated through real-world drug design cases.

摘要

分子生成处于人工智能驱动技术的前沿,在加速小分子药物的开发中发挥着关键作用。实际药物发现的复杂性要求开发一个通用的分子生成框架,以应对各种药物设计挑战。然而,由于基于大语言模型的方法具有自回归性质,现有方法往往难以涵盖小分子药物设计的所有方面,尤其是那些基于语言模型的方面,在诸如连接子设计等任务中尤为明显。为了使语言模型能够胜任更广泛的分子设计任务,我们引入了一种基于片段的无序简化分子输入线性条目系统(FU-SMILES)。在此基础上,我们提出了FragGPT,一种通用的基于片段的分子生成模型。FragGPT最初在大量分子数据集上进行预训练,利用FU-SMILES促进在各种实际应用中的高效生成,如分子设计、连接子设计、R基团探索、骨架跳跃和侧链优化。此外,我们整合了条件生成和强化学习(RL)方法,以确保生成的分子具有多种所需的生物学和物理化学性质。不同场景下的实验结果验证了FragGPT在生成具有增强性质和新颖结构的分子方面的优越性,优于现有的最先进模型。此外,其强大的药物设计能力在实际药物设计案例中得到了进一步证实。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/efa2fca62f9a/d4sc03744h-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/747b92bb784f/d4sc03744h-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/6a02492c1037/d4sc03744h-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/ca78d7aafa52/d4sc03744h-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/efa2fca62f9a/d4sc03744h-f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/747b92bb784f/d4sc03744h-f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/6a02492c1037/d4sc03744h-f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/ca78d7aafa52/d4sc03744h-f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91fe/11352393/efa2fca62f9a/d4sc03744h-f4.jpg

相似文献

1
Unlocking comprehensive molecular design across all scenarios with large language model and unordered chemical language.利用大语言模型和无序化学语言解锁所有场景下的全面分子设计。
Chem Sci. 2024 Jul 29;15(34):13727-13740. doi: 10.1039/d4sc03744h. eCollection 2024 Aug 28.
2
PromptSMILES: prompting for scaffold decoration and fragment linking in chemical language models.PromptSMILES:在化学语言模型中促进支架修饰和片段连接。
J Cheminform. 2024 Jul 4;16(1):77. doi: 10.1186/s13321-024-00866-5.
3
FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery.FAME:用于表型药物发现的基于片段的条件分子生成
Proc SIAM Int Conf Data Min. 2022;2022:720-728. doi: 10.1137/1.9781611977172.81.
4
CONSMI: Contrastive Learning in the Simplified Molecular Input Line Entry System Helps Generate Better Molecules.CONSMI:简化分子输入线性条目系统中的对比学习有助于生成更好的分子。
Molecules. 2024 Jan 19;29(2):495. doi: 10.3390/molecules29020495.
5
Generative Adversarial Networks for De Novo Molecular Design.生成对抗网络用于从头分子设计。
Mol Inform. 2021 Oct;40(10):e2100045. doi: 10.1002/minf.202100045. Epub 2021 Jul 6.
6
Faster and more diverse de novo molecular optimization with double-loop reinforcement learning using augmented SMILES.使用增强型 SMILES 进行双环强化学习,实现更快、更多样的从头分子优化。
J Comput Aided Mol Des. 2023 Aug;37(8):373-394. doi: 10.1007/s10822-023-00512-6. Epub 2023 Jun 17.
7
De Novo Drug Design Using Transformer-Based Machine Translation and Reinforcement Learning of an Adaptive Monte Carlo Tree Search.基于Transformer的机器翻译和自适应蒙特卡罗树搜索强化学习的从头药物设计
Pharmaceuticals (Basel). 2024 Jan 27;17(2):161. doi: 10.3390/ph17020161.
8
UnCorrupt SMILES: a novel approach to de novo design.未腐败的SMILES:一种全新的从头设计方法。
J Cheminform. 2023 Feb 14;15(1):22. doi: 10.1186/s13321-023-00696-x.
9
Deep reinforcement learning for de novo drug design.基于深度强化学习的从头药物设计。
Sci Adv. 2018 Jul 25;4(7):eaap7885. doi: 10.1126/sciadv.aap7885. eCollection 2018 Jul.
10
Can large language models understand molecules?大语言模型能理解分子吗?
BMC Bioinformatics. 2024 Jun 26;25(1):225. doi: 10.1186/s12859-024-05847-x.

引用本文的文献

1
micRoclean: an R package for decontaminating low-biomass 16S-rRNA microbiome data.micRoclean:一个用于净化低生物量16S-rRNA微生物组数据的R包。
Front Bioinform. 2025 May 8;5:1556361. doi: 10.3389/fbinf.2025.1556361. eCollection 2025.
2
Accelerating discovery of bioactive ligands with pharmacophore-informed generative models.利用药效团信息生成模型加速生物活性配体的发现。
Nat Commun. 2025 Mar 10;16(1):2391. doi: 10.1038/s41467-025-56349-0.

本文引用的文献

1
Efficient and accurate large library ligand docking with KarmaDock.使用 KarmaDock 实现高效准确的大型配体库对接。
Nat Comput Sci. 2023 Sep;3(9):789-804. doi: 10.1038/s43588-023-00511-5. Epub 2023 Sep 21.
2
The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.2023 年的 ChEMBL 数据库:一个涵盖多种生物活性数据类型和时间段的药物发现平台。
Nucleic Acids Res. 2024 Jan 5;52(D1):D1180-D1192. doi: 10.1093/nar/gkad1004.
3
FFLOM: A Flow-Based Autoregressive Model for Fragment-to-Lead Optimization.
FFLOM:一种基于流的片段到先导优化的自回归模型。
J Med Chem. 2023 Aug 10;66(15):10808-10823. doi: 10.1021/acs.jmedchem.3c01009. Epub 2023 Jul 20.
4
cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation.cMolGPT:一种用于靶向特定从头分子生成的条件生成式预训练转换器。
Molecules. 2023 May 30;28(11):4430. doi: 10.3390/molecules28114430.
5
Current strategies for the design of PROTAC linkers: a critical review.PROTAC连接子设计的当前策略:批判性综述。
Explor Target Antitumor Ther. 2020;1(5):273-312. doi: 10.37349/etat.2020.00018. Epub 2020 Oct 30.
6
Language models can learn complex molecular distributions.语言模型可以学习复杂的分子分布。
Nat Commun. 2022 Jun 7;13(1):3293. doi: 10.1038/s41467-022-30839-x.
7
Structures of a mammalian TRPM8 in closed state.哺乳动物 TRPM8 处于关闭状态的结构。
Nat Commun. 2022 Jun 3;13(1):3113. doi: 10.1038/s41467-022-30919-y.
8
GEOM, energy-annotated molecular conformations for property prediction and molecular generation.GEOM,带能量注释的分子构象,用于性质预测和分子生成。
Sci Data. 2022 Apr 21;9(1):185. doi: 10.1038/s41597-022-01288-4.
9
Deep generative design with 3D pharmacophoric constraints.具有3D药效团约束的深度生成设计。
Chem Sci. 2021 Oct 25;12(43):14577-14589. doi: 10.1039/d1sc02436a. eCollection 2021 Nov 10.
10
Discovery of Novel TRPM8 Blockers Suitable for the Treatment of Somatic and Ocular Painful Conditions: A Journey through p and LogD Modulation.新型 TRPM8 阻断剂的发现适合治疗躯体和眼部疼痛病症:p 和 LogD 调节的探索之旅。
J Med Chem. 2021 Nov 25;64(22):16820-16837. doi: 10.1021/acs.jmedchem.1c01647. Epub 2021 Nov 11.