Suppr超能文献

通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

机构信息

Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae256.

Abstract

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

摘要

自监督学习在分子表示学习中起着重要作用,因为在许多任务(如化学性质预测和虚拟筛选)中,标记的分子数据通常是有限的。然而,大多数现有的分子预训练方法都集中在分子数据的一种模态上,而 SMILES 和图这两种重要模态的互补信息并没有得到充分的探索。在本研究中,我们提出了一种有效的分子 SMILES 和图的多模态自监督学习框架。具体来说,首先对 SMILES 数据和图数据进行标记化处理,以便它们可以由一个统一的基于 Transformer 的主干网络进行处理,该网络通过掩蔽重建策略进行训练。此外,我们引入了一种专门的非重叠掩蔽策略,以鼓励这两种模态之间的细粒度交互。实验结果表明,我们的框架在一系列分子性质预测任务中达到了最先进的性能,详细的消融研究证明了多模态框架和掩蔽策略的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/5360b524fbde/bbae256f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验