• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GexMolGen:通过基因表达特征的大语言模型编码进行命中类似分子的跨模态生成。

GexMolGen: cross-modal generation of hit-like molecules via large language model encoding of gene expression signatures.

机构信息

Department of Automation, School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang District, Shanghai 200240, China.

Department of Rheumatology, Ren Ji Hospital, Shanghai Jiao Tong University School of Medicine, No. 1630 East Road, Pudong New Area, Shanghai 200127, China.

出版信息

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae525.

DOI:10.1093/bib/bbae525
PMID:39470305
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11514063/
Abstract

Designing de novo molecules with specific biological activity is an essential task since it holds the potential to bypass the exploration of target genes, which is an initial step in the modern drug discovery paradigm. However, traditional methods mainly screen molecules by comparing the desired molecular effects within the documented experimental results. The data set limits this process, and it is hard to conduct direct cross-modal comparisons. Therefore, we propose a solution based on cross-modal generation called GexMolGen (Gene Expression-based Molecule Generator), which generates hit-like molecules using gene expression signatures alone. These signatures are calculated by inputting control and desired gene expression states. Our model GexMolGen adopts a "first-align-then-generate" strategy, aligning the gene expression signatures and molecules within a mapping space, ensuring a smooth cross-modal transition. The transformed molecular embeddings are then decoded into molecular graphs. In addition, we employ an advanced single-cell large language model for input flexibility and pre-train a scaffold-based molecular model to ensure that all generated molecules are 100% valid. Empirical results show that our model can produce molecules highly similar to known references, whether feeding in- or out-of-domain transcriptome data. Furthermore, it can also serve as a reliable tool for cross-modal screening.

摘要

设计具有特定生物活性的全新分子是一项至关重要的任务,因为它有可能绕过现代药物发现范例中探索目标基因这一初始步骤。然而,传统方法主要通过比较文献中记载的实验结果内期望的分子效应来筛选分子。数据集限制了这个过程,而且很难进行直接的跨模态比较。因此,我们提出了一种基于跨模态生成的解决方案,称为 GexMolGen(基于基因表达的分子生成器),它仅使用基因表达特征生成类似命中的分子。这些特征是通过输入对照和期望的基因表达状态来计算的。我们的模型 GexMolGen 采用“先对齐再生成”的策略,在映射空间内对齐基因表达特征和分子,确保平滑的跨模态转换。然后将转换后的分子嵌入解码为分子图。此外,我们还采用了先进的单细胞大语言模型来提高输入的灵活性,并预先训练基于支架的分子模型,以确保生成的所有分子都是 100%有效的。实证结果表明,无论输入的是同源或异源转录组数据,我们的模型都可以生成与已知参考文献高度相似的分子。此外,它还可以作为一种可靠的跨模态筛选工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/3de046e5b5d4/bbae525f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/0f72d0880eb0/bbae525f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/dd2d6eeca9c0/bbae525f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/8bc1d79e3f79/bbae525f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/7ba466e28bed/bbae525f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/b2a2bae9e26a/bbae525f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/3de046e5b5d4/bbae525f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/0f72d0880eb0/bbae525f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/dd2d6eeca9c0/bbae525f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/8bc1d79e3f79/bbae525f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/7ba466e28bed/bbae525f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/b2a2bae9e26a/bbae525f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7e74/11514063/3de046e5b5d4/bbae525f6.jpg

相似文献

1
GexMolGen: cross-modal generation of hit-like molecules via large language model encoding of gene expression signatures.GexMolGen:通过基因表达特征的大语言模型编码进行命中类似分子的跨模态生成。
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae525.
2
Gex2SGen: Designing Drug-like Molecules from Desired Gene Expression Signatures.Gex2SGen:从期望的基因表达特征设计类药物分子。
J Chem Inf Model. 2023 Apr 10;63(7):1882-1893. doi: 10.1021/acs.jcim.2c01301. Epub 2023 Mar 27.
3
Optimizing in silico drug discovery: simulation of connected differential expression signatures and applications to benchmarking.优化计算机药物发现:连接差异表达谱的模拟及在基准测试中的应用。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae299.
4
De novo generation of hit-like molecules from gene expression signatures using artificial intelligence.利用人工智能从基因表达特征生成类似命中的新分子。
Nat Commun. 2020 Jan 3;11(1):10. doi: 10.1038/s41467-019-13807-w.
5
FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR:使用变压器的多目标从头药物设计的端到端反馈策略。
Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.
6
TRIOMPHE: Transcriptome-Based Inference and Generation of Molecules with Desired Phenotypes by Machine Learning.TRIOMPHE:基于转录组的机器学习推断和具有预期表型的分子生成。
J Chem Inf Model. 2021 Sep 27;61(9):4303-4320. doi: 10.1021/acs.jcim.1c00967. Epub 2021 Sep 16.
7
MTMol-GPT: De novo multi-target molecular generation with transformer-based generative adversarial imitation learning.MTMol-GPT:基于生成式对抗模仿学习的新型多靶点分子生成
PLoS Comput Biol. 2024 Jun 26;20(6):e1012229. doi: 10.1371/journal.pcbi.1012229. eCollection 2024 Jun.
8
Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展
PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.
9
Transcriptomic Data Mining and Repurposing for Computational Drug Discovery.用于计算药物发现的转录组学数据挖掘与药物重新利用
Methods Mol Biol. 2019;1903:73-95. doi: 10.1007/978-1-4939-8955-3_5.
10
De Novo Molecule Design by Translating from Reduced Graphs to SMILES.从头设计分子:从简化图到 SMILES 的转换。
J Chem Inf Model. 2019 Mar 25;59(3):1136-1146. doi: 10.1021/acs.jcim.8b00626. Epub 2018 Dec 21.

引用本文的文献

1
Generative Deep Learning for de Novo Drug Design─A Chemical Space Odyssey.用于从头药物设计的生成式深度学习——一场化学空间奥德赛。
J Chem Inf Model. 2025 Jul 28;65(14):7352-7372. doi: 10.1021/acs.jcim.5c00641. Epub 2025 Jul 9.

本文引用的文献

1
GeneCompass: deciphering universal gene regulatory mechanisms with a knowledge-informed cross-species foundation model.基因指南针:基于知识驱动的跨物种基础模型解析通用基因调控机制
Cell Res. 2024 Dec;34(12):830-845. doi: 10.1038/s41422-024-01034-y. Epub 2024 Oct 8.
2
Large-scale foundation model on single-cell transcriptomics.单细胞转录组学的大规模基础模型。
Nat Methods. 2024 Aug;21(8):1481-1491. doi: 10.1038/s41592-024-02305-7. Epub 2024 Jun 6.
3
A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets.
双扩散模型能够基于靶口袋进行 3D 分子生成和先导化合物优化。
Nat Commun. 2024 Mar 26;15(1):2657. doi: 10.1038/s41467-024-46569-1.
4
scGPT: toward building a foundation model for single-cell multi-omics using generative AI.scGPT:迈向使用生成式人工智能构建单细胞多组学基础模型
Nat Methods. 2024 Aug;21(8):1470-1480. doi: 10.1038/s41592-024-02201-0. Epub 2024 Feb 26.
5
A visual-language foundation model for pathology image analysis using medical Twitter.一种使用医学推特进行病理学图像分析的视觉语言基础模型。
Nat Med. 2023 Sep;29(9):2307-2316. doi: 10.1038/s41591-023-02504-3. Epub 2023 Aug 17.
6
Predicting transcriptional outcomes of novel multigene perturbations with GEARS.用 GEARS 预测新型多基因扰动的转录结果。
Nat Biotechnol. 2024 Jun;42(6):927-935. doi: 10.1038/s41587-023-01905-6. Epub 2023 Aug 17.
7
Transfer learning enables predictions in network biology.迁移学习可实现网络生物学预测。
Nature. 2023 Jun;618(7965):616-624. doi: 10.1038/s41586-023-06139-9. Epub 2023 May 31.
8
Gex2SGen: Designing Drug-like Molecules from Desired Gene Expression Signatures.Gex2SGen:从期望的基因表达特征设计类药物分子。
J Chem Inf Model. 2023 Apr 10;63(7):1882-1893. doi: 10.1021/acs.jcim.2c01301. Epub 2023 Mar 27.
9
Transformer for one stop interpretable cell type annotation.用于一站式可解释细胞类型注释的 Transformer。
Nat Commun. 2023 Jan 14;14(1):223. doi: 10.1038/s41467-023-35923-4.
10
Deep generative model for therapeutic targets using transcriptomic disease-associated data-USP7 case study.基于转录组疾病相关数据的治疗靶点深度生成模型——USP7 案例研究。
Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac270.