基于Transformer的方法进行分子描述符性质预测

Molecular Descriptors Property Prediction Using Transformer-Based Approach.

作者信息

Tran Tuan, Ekenna Chinwe

机构信息

Department of Computer Science, University at Albany, Albany, NY 12203, USA.

出版信息

Int J Mol Sci. 2023 Jul 26;24(15):11948. doi: 10.3390/ijms241511948.

DOI:10.3390/ijms241511948

PMID:37569322

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10419034/

Abstract

In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models.

摘要

在本研究中，我们引入了旨在预测分子性质的半监督机器学习模型。我们的模型采用两阶段方法，包括预训练和微调。特别地，我们的模型利用了大量由SMILES字符串组成的标记和未标记数据，SMILES字符串是一种分子的文本表示系统。在预训练阶段，我们的模型利用在自然语言处理中广泛使用的掩码语言模型来学习分子化学空间表示。在微调阶段，我们的模型在较小的标记数据集上进行训练，以处理特定的下游任务，如分类或回归。初步结果表明，在从MoleculeNet中选择的下游任务上，我们的模型表现出与最先进模型相当的性能。此外，为了减少计算开销，我们提出了一种新方法，利用3D化合物结构来计算端到端变压器模型中用于预测抗疟疾药物候选物的注意力分数。结果表明，使用所提出的注意力分数，我们的端到端模型能够具有与预训练模型相当的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cb5/10419034/c5d162af476e/ijms-24-11948-g001.jpg

相似文献

Molecular Descriptors Property Prediction Using Transformer-Based Approach.基于Transformer的方法进行分子描述符性质预测

Int J Mol Sci. 2023 Jul 26;24(15):11948. doi: 10.3390/ijms241511948.

Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration.通过SMILES枚举增强的多任务学习BERT推动药物发现中分子性质预测的边界

Research (Wash D C). 2022 Dec 15;2022:0004. doi: 10.34133/research.0004. eCollection 2022.

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae256.

A merged molecular representation learning for molecular properties prediction with a web-based service.基于网络服务的分子性质预测的融合分子表示学习。

Sci Rep. 2021 May 26;11(1):11028. doi: 10.1038/s41598-021-90259-7.

Descriptor-Free Deep Learning QSAR Model for the Fraction Unbound in Human Plasma.用于预测人血浆中游离分数的无描述符深度学习定量构效关系模型

Mol Pharm. 2023 Oct 2;20(10):4984-4993. doi: 10.1021/acs.molpharmaceut.3c00129. Epub 2023 Sep 1.

MolGPT: Molecular Generation Using a Transformer-Decoder Model.MolGPT：基于 Transformer-Decoder 模型的分子生成。

J Chem Inf Model. 2022 May 9;62(9):2064-2076. doi: 10.1021/acs.jcim.1c00600. Epub 2021 Oct 25.

Transformers-sklearn: a toolkit for medical language understanding with transformer-based models.Transformer-sklearn：一个基于 Transformer 的模型的医学语言理解工具包。

BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):90. doi: 10.1186/s12911-021-01459-0.

NoiseMol: A noise-robusted data augmentation via perturbing noise for molecular property prediction.NoiseMol：一种通过扰动噪声进行抗噪数据增强的分子性质预测方法。

J Mol Graph Model. 2023 Jun;121:108454. doi: 10.1016/j.jmgm.2023.108454. Epub 2023 Mar 15.

Learning self-supervised molecular representations for drug-drug interaction prediction.学习用于药物-药物相互作用预测的自监督分子表示。

BMC Bioinformatics. 2024 Jan 30;25(1):47. doi: 10.1186/s12859-024-05643-7.

Can large language models understand molecules?大语言模型能理解分子吗？

BMC Bioinformatics. 2024 Jun 26;25(1):225. doi: 10.1186/s12859-024-05847-x.

引用本文的文献

Graph and Multi-Level Sequence Fusion Learning for Predicting the Molecular Activity of BACE-1 Inhibitors.用于预测BACE-1抑制剂分子活性的图形和多级序列融合学习

Int J Mol Sci. 2025 Feb 16;26(4):1681. doi: 10.3390/ijms26041681.

Transformer-based models for chemical SMILES representation: A comprehensive literature review.用于化学SMILES表示的基于Transformer的模型：全面的文献综述。

Heliyon. 2024 Oct 9;10(20):e39038. doi: 10.1016/j.heliyon.2024.e39038. eCollection 2024 Oct 30.

Predicting blood-brain barrier permeability of molecules with a large language model and machine learning.利用大语言模型和机器学习预测分子的血脑屏障通透性。

Sci Rep. 2024 Jul 9;14(1):15844. doi: 10.1038/s41598-024-66897-y.

HBCVTr: an end-to-end transformer with a deep neural network hybrid model for anti-HBV and HCV activity predictor from SMILES.HBCVTr：一种用于从SMILES预测抗HBV和HCV活性的具有深度神经网络混合模型的端到端变压器。

Sci Rep. 2024 Apr 22;14(1):9262. doi: 10.1038/s41598-024-59933-4.

本文引用的文献

SELFIES and the future of molecular string representations.自拍与分子串表示法的未来。

Patterns (N Y). 2022 Oct 14;3(10):100588. doi: 10.1016/j.patter.2022.100588.

Antimalarial Drug Predictions Using Molecular Descriptors and Machine Learning against Plasmodium Falciparum.利用分子描述符和机器学习对抗恶性疟原虫的抗疟药物预测。

Biomolecules. 2021 Nov 24;11(12):1750. doi: 10.3390/biom11121750.

AlphaFold2 and the future of structural biology.阿尔法折叠2与结构生物学的未来。

Nat Struct Mol Biol. 2021 Sep;28(9):704-705. doi: 10.1038/s41594-021-00650-1.

Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT.用于分子活性预测的归纳迁移学习：基于MolPMoFiT的下一代QSAR模型。

J Cheminform. 2020 Apr 22;12(1):27. doi: 10.1186/s13321-020-00430-x.

Evaluating Protein Transfer Learning with TAPE.使用TAPE评估蛋白质迁移学习。

Adv Neural Inf Process Syst. 2019 Dec;32:9689-9701.

Predicting Binding from Screening Assays with Transformer Network Embeddings.基于 Transformer 网络嵌入的筛选实验预测结合。

J Chem Inf Model. 2020 Sep 28;60(9):4191-4199. doi: 10.1021/acs.jcim.9b01212. Epub 2020 Jul 1.

Classification models for predicting the antimalarial activity against .预测. 抗疟活性的分类模型

SAR QSAR Environ Res. 2020 Apr;31(4):313-324. doi: 10.1080/1062936X.2020.1740890. Epub 2020 Mar 19.

Deep Learning-driven research for drug discovery: Tackling Malaria.深度学习驱动的药物发现研究：攻克疟疾。

PLoS Comput Biol. 2020 Feb 18;16(2):e1007025. doi: 10.1371/journal.pcbi.1007025. eCollection 2020 Feb.

DeepMalaria: Artificial Intelligence Driven Discovery of Potent Antiplasmodials.深度疟疾：人工智能驱动的强效抗疟药物发现

Front Pharmacol. 2020 Jan 15;10:1526. doi: 10.3389/fphar.2019.01526. eCollection 2019.

Molecular Geometry Prediction using a Deep Generative Graph Neural Network.基于深度生成图神经网络的分子几何结构预测。

Sci Rep. 2019 Dec 31;9(1):20381. doi: 10.1038/s41598-019-56773-5.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于Transformer的方法进行分子描述符性质预测

Molecular Descriptors Property Prediction Using Transformer-Based Approach.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献