用于分子生成设计的概率生成式变压器语言模型。

Probabilistic generative transformer language models for generative design of molecules.

作者信息

Wei Lai, Fu Nihang, Song Yuqi, Wang Qian, Hu Jianjun

机构信息

Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, 29201, USA.

Department of Chemistry and Biochemistry, University of South Carolina, Columbia, SC, 29201, USA.

出版信息

J Cheminform. 2023 Sep 25;15(1):88. doi: 10.1186/s13321-023-00759-z.

DOI:10.1186/s13321-023-00759-z

PMID:37749655

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10518939/

Abstract

Self-supervised neural language models have recently found wide applications in the generative design of organic molecules and protein sequences as well as representation learning for downstream structure classification and functional prediction. However, most of the existing deep learning models for molecule design usually require a big dataset and have a black-box architecture, which makes it difficult to interpret their design logic. Here we propose the Generative Molecular Transformer (GMTransformer), a probabilistic neural network model for generative design of molecules. Our model is built on the blank filling language model originally developed for text processing, which has demonstrated unique advantages in learning the "molecules grammars" with high-quality generation, interpretability, and data efficiency. Benchmarked on the MOSES datasets, our models achieve high novelty and Scaf compared to other baselines. The probabilistic generation steps have the potential in tinkering with molecule design due to their capability of recommending how to modify existing molecules with explanation, guided by the learned implicit molecule chemistry. The source code and datasets can be accessed freely at https://github.com/usccolumbia/GMTransformer.

摘要

自监督神经语言模型最近在有机分子和蛋白质序列的生成设计以及下游结构分类和功能预测的表征学习中得到了广泛应用。然而，大多数现有的用于分子设计的深度学习模型通常需要一个大数据集，并且具有黑箱架构，这使得难以解释它们的设计逻辑。在此，我们提出了生成分子变压器（GMTransformer），一种用于分子生成设计的概率神经网络模型。我们的模型基于最初为文本处理开发的填空语言模型构建，该模型在学习“分子语法”方面具有独特优势，能够实现高质量生成、可解释性和数据效率。在MOSES数据集上进行基准测试时，我们的模型与其他基线相比具有更高的新颖性和Scaf。概率生成步骤具有调整分子设计的潜力，因为它们能够在所学的隐式分子化学的指导下，推荐如何修改现有分子并给出解释。源代码和数据集可在https://github.com/usccolumbia/GMTransformer上免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2da7/10518939/b39e24a3fc2e/13321_2023_759_Fig1_HTML.jpg

相似文献

Probabilistic generative transformer language models for generative design of molecules.

J Cheminform. 2023 Sep 25;15(1):88. doi: 10.1186/s13321-023-00759-z.

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials.

Adv Sci (Weinh). 2024 Sep;11(36):e2304305. doi: 10.1002/advs.202304305. Epub 2024 Aug 5.

FSM-DDTR: End-to-end feedback strategy for multi-objective De Novo drug design using transformers.

Comput Biol Med. 2023 Sep;164:107285. doi: 10.1016/j.compbiomed.2023.107285. Epub 2023 Jul 31.

Sc2Mol: a scaffold-based two-step molecule generator with variational autoencoder and transformer.

Bioinformatics. 2023 Jan 1;39(1). doi: 10.1093/bioinformatics/btac814.

Molecular Sets (MOSES): A Benchmarking Platform for Molecular Generation Models.

Front Pharmacol. 2020 Dec 18;11:565644. doi: 10.3389/fphar.2020.565644. eCollection 2020.

Learning Joint 2-D and 3-D Graph Diffusion Models for Complete Molecule Generation.

IEEE Trans Neural Netw Learn Syst. 2024 Sep;35(9):11857-11871. doi: 10.1109/TNNLS.2024.3416328. Epub 2024 Sep 3.

Applications of Deep Learning in Molecule Generation and Molecular Property Prediction.

Acc Chem Res. 2021 Jan 19;54(2):263-270. doi: 10.1021/acs.accounts.0c00699. Epub 2020 Dec 28.

MolGPT: Molecular Generation Using a Transformer-Decoder Model.

J Chem Inf Model. 2022 May 9;62(9):2064-2076. doi: 10.1021/acs.jcim.1c00600. Epub 2021 Oct 25.

Generative Pre-trained Transformer (GPT) based model with relative attention for de novo drug design.

Comput Biol Chem. 2023 Oct;106:107911. doi: 10.1016/j.compbiolchem.2023.107911. Epub 2023 Jun 29.

Novel deep generative simultaneous recurrent model for efficient representation learning.

Neural Netw. 2018 Nov;107:12-22. doi: 10.1016/j.neunet.2018.04.020. Epub 2018 Aug 9.

引用本文的文献

An open-source family of large encoder-decoder foundation models for chemistry.

Commun Chem. 2025 Jul 1;8(1):193. doi: 10.1038/s42004-025-01585-0.

Activity cliff-aware reinforcement learning for de novo drug design.

J Cheminform. 2025 Apr 21;17(1):54. doi: 10.1186/s13321-025-01006-3.

CardioGenAI: a machine learning-based framework for re-engineering drugs for reduced hERG liability.

J Cheminform. 2025 Mar 5;17(1):30. doi: 10.1186/s13321-025-00976-8.

A systematic review of deep learning chemical language models in recent era.

J Cheminform. 2024 Nov 18;16(1):129. doi: 10.1186/s13321-024-00916-y.

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials.

Adv Sci (Weinh). 2024 Sep;11(36):e2304305. doi: 10.1002/advs.202304305. Epub 2024 Aug 5.

Application of Transformers in Cheminformatics.

J Chem Inf Model. 2024 Jun 10;64(11):4392-4409. doi: 10.1021/acs.jcim.3c02070. Epub 2024 May 30.

ChemSpaceAL: An Efficient Active Learning Methodology Applied to Protein-Specific Molecular Generation.

ArXiv. 2023 Dec 4:arXiv:2309.05853v2.

本文引用的文献

Crystal Composition Transformer: Self-Learning Neural Language Model for Generative and Tinkering Design of Materials.

Adv Sci (Weinh). 2024 Sep;11(36):e2304305. doi: 10.1002/advs.202304305. Epub 2024 Aug 5.

High-throughput property-driven generative design of functional organic molecules.

Nat Comput Sci. 2023 Feb;3(2):139-148. doi: 10.1038/s43588-022-00391-1. Epub 2023 Feb 6.

cMolGPT: A Conditional Generative Pre-Trained Transformer for Target-Specific De Novo Molecular Generation.

Molecules. 2023 May 30;28(11):4430. doi: 10.3390/molecules28114430.

Deep generative models for 3D molecular structure.

Curr Opin Struct Biol. 2023 Jun;80:102566. doi: 10.1016/j.sbi.2023.102566. Epub 2023 Mar 29.

Reactivity of Covalent Fragments and Their Role in Fragment Based Drug Discovery.

Pharmaceuticals (Basel). 2022 Nov 8;15(11):1366. doi: 10.3390/ph15111366.

Exploring Low-Toxicity Chemical Space with Deep Learning for Molecular Generation.

J Chem Inf Model. 2022 Jul 11;62(13):3191-3199. doi: 10.1021/acs.jcim.2c00671. Epub 2022 Jun 17.

Language models can learn complex molecular distributions.

Nat Commun. 2022 Jun 7;13(1):3293. doi: 10.1038/s41467-022-30839-x.

Evolutionary design of molecules based on deep learning and a genetic algorithm.

Sci Rep. 2021 Aug 27;11(1):17304. doi: 10.1038/s41598-021-96812-8.

De novo molecular design and generative models.

Drug Discov Today. 2021 Nov;26(11):2707-2715. doi: 10.1016/j.drudis.2021.05.019. Epub 2021 Jun 1.

SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning.

J Chem Inf Model. 2021 Apr 26;61(4):1560-1569. doi: 10.1021/acs.jcim.0c01127. Epub 2021 Mar 14.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Suppr
超能文献

用于分子生成设计的概率生成式变压器语言模型。

Probabilistic generative transformer language models for generative design of molecules.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr超能文献

用于分子生成设计的概率生成式变压器语言模型。

Probabilistic generative transformer language models for generative design of molecules.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

Suppr
超能文献