FAME：用于表型药物发现的基于片段的条件分子生成

FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery.

作者信息

Pham Thai-Hoang, Xie Lei, Zhang Ping

机构信息

Department of Computer Science and Engineering, The Ohio State University, Columbus, USA.

Department of Computer Science, Hunter College, The City University of New York, New York City, USA; Neuroscience, Weill Cornell Medicine, New York City, USA.

出版信息

Proc SIAM Int Conf Data Min. 2022;2022:720-728. doi: 10.1137/1.9781611977172.81.

DOI:10.1137/1.9781611977172.81

PMID:35509686

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9061137/

Abstract

molecular design is a key challenge in drug discovery due to the complexity of chemical space. With the availability of molecular datasets and advances in machine learning, many deep generative models are proposed for generating novel molecules with desired properties. However, most of the existing models focus only on molecular distribution learning and target-based molecular design, thereby hindering their potentials in real-world applications. In drug discovery, phenotypic molecular design has advantages over target-based molecular design, especially in first-in-class drug discovery. In this work, we propose the first deep graph generative model (FAME) targeting phenotypic molecular design, in particular gene expression-based molecular design. FAME leverages a conditional variational autoencoder framework to learn the conditional distribution generating molecules from gene expression profiles. However, this distribution is difficult to learn due to the complexity of the molecular space and the noisy phenomenon in gene expression data. To tackle these issues, a gene expression denoising (GED) model that employs contrastive objective function is first proposed to reduce noise from gene expression data. FAME is then designed to treat molecules as the sequences of fragments and learn to generate these fragments in autoregressive manner. By leveraging this fragment-based generation strategy and the denoised gene expression profiles, FAME can generate novel molecules with a high validity rate and desired biological activity. The experimental results show that FAME outperforms existing methods including both SMILES-based and graph-based deep generative models for phenotypic molecular design. Furthermore, the effective mechanism for reducing noise in gene expression data proposed in our study can be applied to omics data modeling in general for facilitating phenotypic drug discovery.

摘要

由于化学空间的复杂性，分子设计是药物研发中的一项关键挑战。随着分子数据集的可得性以及机器学习的进展，人们提出了许多深度生成模型来生成具有所需特性的新型分子。然而，现有的大多数模型仅专注于分子分布学习和基于靶点的分子设计，从而限制了它们在实际应用中的潜力。在药物研发中，表型分子设计相较于基于靶点的分子设计具有优势，尤其是在首创药物研发中。在这项工作中，我们提出了首个针对表型分子设计，特别是基于基因表达的分子设计的深度图生成模型（FAME）。FAME利用条件变分自编码器框架从基因表达谱中学习生成分子的条件分布。然而，由于分子空间的复杂性和基因表达数据中的噪声现象，这种分布很难学习。为了解决这些问题，首先提出了一种采用对比目标函数的基因表达去噪（GED）模型来减少基因表达数据中的噪声。然后，FAME被设计为将分子视为片段序列，并以自回归方式学习生成这些片段。通过利用这种基于片段的生成策略和去噪后的基因表达谱，FAME可以生成具有高有效率和所需生物活性的新型分子。实验结果表明，FAME在表型分子设计方面优于包括基于SMILES和基于图的深度生成模型在内的现有方法。此外，我们研究中提出的减少基因表达数据噪声的有效机制一般可应用于组学数据建模，以促进表型药物研发。

相似文献

FAME: Fragment-based Conditional Molecular Generation for Phenotypic Drug Discovery.FAME：用于表型药物发现的基于片段的条件分子生成

Proc SIAM Int Conf Data Min. 2022;2022:720-728. doi: 10.1137/1.9781611977172.81.

Conditional Molecular Design with Deep Generative Models.条件分子设计与深度生成模型。

J Chem Inf Model. 2019 Jan 28;59(1):43-52. doi: 10.1021/acs.jcim.8b00263. Epub 2018 Jul 27.

Fragment-based deep molecular generation using hierarchical chemical graph representation and multi-resolution graph variational autoencoder.基于层次化学图表示和多分辨率图变分自动编码器的基于片段的深度分子生成。

Mol Inform. 2023 May;42(5):e2200215. doi: 10.1002/minf.202200215. Epub 2023 Mar 17.

Multi-objective de novo drug design with conditional graph generative model.基于条件图生成模型的多目标从头药物设计

J Cheminform. 2018 Jul 24;10(1):33. doi: 10.1186/s13321-018-0287-6.

MGCVAE: Multi-Objective Inverse Design via Molecular Graph Conditional Variational Autoencoder.MGCVAE：基于分子图条件变分自动编码器的多目标反设计。

J Chem Inf Model. 2022 Jun 27;62(12):2943-2950. doi: 10.1021/acs.jcim.2c00487. Epub 2022 Jun 6.

CMGN: a conditional molecular generation net to design target-specific molecules with desired properties.CMGN：一种有条件的分子生成网络，用于设计具有所需性质的目标特定分子。

Brief Bioinform. 2023 Jul 20;24(4). doi: 10.1093/bib/bbad185.

Efficient learning of non-autoregressive graph variational autoencoders for molecular graph generation.用于分子图生成的非自回归图变分自编码器的高效学习。

J Cheminform. 2019 Nov 21;11(1):70. doi: 10.1186/s13321-019-0396-x.

druGAN: An Advanced Generative Adversarial Autoencoder Model for de Novo Generation of New Molecules with Desired Molecular Properties in Silico.druGAN：一种高级生成对抗自动编码器模型，可在计算机上从头生成具有所需分子特性的新分子。

Mol Pharm. 2017 Sep 5;14(9):3098-3104. doi: 10.1021/acs.molpharmaceut.7b00346. Epub 2017 Aug 4.

Geometry-Based Molecular Generation With Deep Constrained Variational Autoencoder.基于几何的深度约束变分自编码器分子生成

IEEE Trans Neural Netw Learn Syst. 2024 Apr;35(4):4852-4861. doi: 10.1109/TNNLS.2022.3147790. Epub 2024 Apr 4.

FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.FSM-DDTR：使用变压器的多目标从头药物设计的端到端反馈策略。

Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.

引用本文的文献

Deep representation learning of chemical-induced transcriptional profile for phenotype-based drug discovery.基于表型的药物发现中化学诱导转录谱的深度表示学习。

Nat Commun. 2024 Jun 25;15(1):5378. doi: 10.1038/s41467-024-49620-3.

TransGEM: a molecule generation model based on Transformer with gene expression data.TransGEM：基于基因表达数据的 Transformer 分子生成模型。

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae189.

Developing an Improved Cycle Architecture for AI-Based Generation of New Structures Aimed at Drug Discovery.开发基于人工智能的新药结构生成的改进循环架构。

Molecules. 2024 Mar 27;29(7):1499. doi: 10.3390/molecules29071499.

本文引用的文献

Molecular Generation for Desired Transcriptome Changes With Adversarial Autoencoders.使用对抗自编码器实现所需转录组变化的分子生成

Front Pharmacol. 2020 Apr 17;11:269. doi: 10.3389/fphar.2020.00269. eCollection 2020.

De novo generation of hit-like molecules from gene expression signatures using artificial intelligence.利用人工智能从基因表达特征生成类似命中的新分子。

Nat Commun. 2020 Jan 3;11(1):10. doi: 10.1038/s41467-019-13807-w.

Fréchet ChemNet Distance: A Metric for Generative Models for Molecules in Drug Discovery.Fréchet ChemNet 距离：药物发现中分子生成模型的一种度量。

J Chem Inf Model. 2018 Sep 24;58(9):1736-1741. doi: 10.1021/acs.jcim.8b00234. Epub 2018 Aug 28.

Deep reinforcement learning for de novo drug design.基于深度强化学习的从头药物设计。

Sci Adv. 2018 Jul 25;4(7):eaap7885. doi: 10.1126/sciadv.aap7885. eCollection 2018 Jul.

Multi-objective de novo drug design with conditional graph generative model.基于条件图生成模型的多目标从头药物设计

J Cheminform. 2018 Jul 24;10(1):33. doi: 10.1186/s13321-018-0287-6.

Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules.使用数据驱动的分子连续表示法进行自动化学设计。

ACS Cent Sci. 2018 Feb 28;4(2):268-276. doi: 10.1021/acscentsci.7b00572. Epub 2018 Jan 12.

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks.使用递归神经网络生成用于药物发现的聚焦分子库。

ACS Cent Sci. 2018 Jan 24;4(1):120-131. doi: 10.1021/acscentsci.7b00512. Epub 2017 Dec 28.

A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles.下一代连接图谱：L1000平台及首批100万个图谱

Cell. 2017 Nov 30;171(6):1437-1452.e17. doi: 10.1016/j.cell.2017.10.049.

Opportunities and challenges in phenotypic drug discovery: an industry perspective.表型药物发现的机遇与挑战：行业视角。

Nat Rev Drug Discov. 2017 Aug;16(8):531-543. doi: 10.1038/nrd.2017.111. Epub 2017 Jul 7.

A dataset of images and morphological profiles of 30 000 small-molecule treatments using the Cell Painting assay.使用细胞染色法获得的 30000 种小分子处理的图像和形态特征数据集。

Gigascience. 2017 Dec 1;6(12):1-5. doi: 10.1093/gigascience/giw014.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验