通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

机构信息

Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae256.

DOI:10.1093/bib/bbae256

PMID:38801702

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11129775/

Abstract

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

摘要

自监督学习在分子表示学习中起着重要作用，因为在许多任务（如化学性质预测和虚拟筛选）中，标记的分子数据通常是有限的。然而，大多数现有的分子预训练方法都集中在分子数据的一种模态上，而 SMILES 和图这两种重要模态的互补信息并没有得到充分的探索。在本研究中，我们提出了一种有效的分子 SMILES 和图的多模态自监督学习框架。具体来说，首先对 SMILES 数据和图数据进行标记化处理，以便它们可以由一个统一的基于 Transformer 的主干网络进行处理，该网络通过掩蔽重建策略进行训练。此外，我们引入了一种专门的非重叠掩蔽策略，以鼓励这两种模态之间的细粒度交互。实验结果表明，我们的框架在一系列分子性质预测任务中达到了最先进的性能，详细的消融研究证明了多模态框架和掩蔽策略的有效性。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献