Suppr超能文献

基于Transformer的方法进行分子描述符性质预测

Molecular Descriptors Property Prediction Using Transformer-Based Approach.

作者信息

Tran Tuan, Ekenna Chinwe

机构信息

Department of Computer Science, University at Albany, Albany, NY 12203, USA.

出版信息

Int J Mol Sci. 2023 Jul 26;24(15):11948. doi: 10.3390/ijms241511948.

Abstract

In this study, we introduce semi-supervised machine learning models designed to predict molecular properties. Our model employs a two-stage approach, involving pre-training and fine-tuning. Particularly, our model leverages a substantial amount of labeled and unlabeled data consisting of SMILES strings, a text representation system for molecules. During the pre-training stage, our model capitalizes on the Masked Language Model, which is widely used in natural language processing, for learning molecular chemical space representations. During the fine-tuning stage, our model is trained on a smaller labeled dataset to tackle specific downstream tasks, such as classification or regression. Preliminary results indicate that our model demonstrates comparable performance to state-of-the-art models on the chosen downstream tasks from MoleculeNet. Additionally, to reduce the computational overhead, we propose a new approach taking advantage of 3D compound structures for calculating the attention score used in the end-to-end transformer model to predict anti-malaria drug candidates. The results show that using the proposed attention score, our end-to-end model is able to have comparable performance with pre-trained models.

摘要

在本研究中,我们引入了旨在预测分子性质的半监督机器学习模型。我们的模型采用两阶段方法,包括预训练和微调。特别地,我们的模型利用了大量由SMILES字符串组成的标记和未标记数据,SMILES字符串是一种分子的文本表示系统。在预训练阶段,我们的模型利用在自然语言处理中广泛使用的掩码语言模型来学习分子化学空间表示。在微调阶段,我们的模型在较小的标记数据集上进行训练,以处理特定的下游任务,如分类或回归。初步结果表明,在从MoleculeNet中选择的下游任务上,我们的模型表现出与最先进模型相当的性能。此外,为了减少计算开销,我们提出了一种新方法,利用3D化合物结构来计算端到端变压器模型中用于预测抗疟疾药物候选物的注意力分数。结果表明,使用所提出的注意力分数,我们的端到端模型能够具有与预训练模型相当的性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cb5/10419034/c5d162af476e/ijms-24-11948-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验