• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。

Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.

机构信息

Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

Shanghai Key Laboratory of Medical Image Computing and Computer Assisted Intervention, Fudan University, 131 Dong'an Road, 200032, Shanghai, China.

出版信息

Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae256.

DOI:10.1093/bib/bbae256
PMID:38801702
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11129775/
Abstract

Self-supervised learning plays an important role in molecular representation learning because labeled molecular data are usually limited in many tasks, such as chemical property prediction and virtual screening. However, most existing molecular pre-training methods focus on one modality of molecular data, and the complementary information of two important modalities, SMILES and graph, is not fully explored. In this study, we propose an effective multi-modality self-supervised learning framework for molecular SMILES and graph. Specifically, SMILES data and graph data are first tokenized so that they can be processed by a unified Transformer-based backbone network, which is trained by a masked reconstruction strategy. In addition, we introduce a specialized non-overlapping masking strategy to encourage fine-grained interaction between these two modalities. Experimental results show that our framework achieves state-of-the-art performance in a series of molecular property prediction tasks, and a detailed ablation study demonstrates efficacy of the multi-modality framework and the masking strategy.

摘要

自监督学习在分子表示学习中起着重要作用,因为在许多任务(如化学性质预测和虚拟筛选)中,标记的分子数据通常是有限的。然而,大多数现有的分子预训练方法都集中在分子数据的一种模态上,而 SMILES 和图这两种重要模态的互补信息并没有得到充分的探索。在本研究中,我们提出了一种有效的分子 SMILES 和图的多模态自监督学习框架。具体来说,首先对 SMILES 数据和图数据进行标记化处理,以便它们可以由一个统一的基于 Transformer 的主干网络进行处理,该网络通过掩蔽重建策略进行训练。此外,我们引入了一种专门的非重叠掩蔽策略,以鼓励这两种模态之间的细粒度交互。实验结果表明,我们的框架在一系列分子性质预测任务中达到了最先进的性能,详细的消融研究证明了多模态框架和掩蔽策略的有效性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/b7d2338d0c41/bbae256f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/5360b524fbde/bbae256f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/85cb2cb9da4d/bbae256f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/cb57e6bd10d1/bbae256f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/68fadfb9391d/bbae256f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/b7e82d828ffb/bbae256f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/4897f92344f5/bbae256f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/b7d2338d0c41/bbae256f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/5360b524fbde/bbae256f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/85cb2cb9da4d/bbae256f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/cb57e6bd10d1/bbae256f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/68fadfb9391d/bbae256f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/b7e82d828ffb/bbae256f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/4897f92344f5/bbae256f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bdaf/11129775/b7d2338d0c41/bbae256f7.jpg

相似文献

1
Complementary multi-modality molecular self-supervised learning via non-overlapping masking for property prediction.通过非重叠掩蔽进行互补多模态分子自监督学习以进行性质预测。
Brief Bioinform. 2024 May 23;25(4). doi: 10.1093/bib/bbae256.
2
A modality-collaborative convolution and transformer hybrid network for unpaired multi-modal medical image segmentation with limited annotations.一种用于具有有限标注的未配对多模态医学图像分割的模态协作卷积与Transformer混合网络。
Med Phys. 2023 Sep;50(9):5460-5478. doi: 10.1002/mp.16338. Epub 2023 Mar 15.
3
Hierarchical Molecular Graph Self-Supervised Learning for property prediction.用于属性预测的分层分子图自监督学习
Commun Chem. 2023 Feb 17;6(1):34. doi: 10.1038/s42004-023-00825-5.
4
FTMMR: Fusion Transformer for Integrating Multiple Molecular Representations.FTMMR:融合多分子表示的Transformer。
IEEE J Biomed Health Inform. 2024 Jul;28(7):4361-4372. doi: 10.1109/JBHI.2024.3383221. Epub 2024 Jul 2.
5
Attention-wise masked graph contrastive learning for predicting molecular property.基于注意力机制的掩码图对比学习预测分子性质。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac303.
6
BatmanNet: bi-branch masked graph transformer autoencoder for molecular representation.BatmanNet:用于分子表示的双分支掩蔽图 Transformer 自动编码器。
Brief Bioinform. 2023 Nov 22;25(1). doi: 10.1093/bib/bbad400.
7
Adaptive self-supervised learning for sequential recommendation.自适应自监督学习在序列推荐中的应用。
Neural Netw. 2024 Nov;179:106570. doi: 10.1016/j.neunet.2024.106570. Epub 2024 Jul 24.
8
Self-Supervised Molecular Representation Learning With Topology and Geometry.基于拓扑和几何的自监督分子表示学习
IEEE J Biomed Health Inform. 2025 Jan;29(1):700-710. doi: 10.1109/JBHI.2024.3479194. Epub 2025 Jan 7.
9
MSLTE: multiple self-supervised learning tasks for enhancing EEG emotion recognition.多任务自监督学习增强 EEG 情绪识别
J Neural Eng. 2024 Apr 17;21(2). doi: 10.1088/1741-2552/ad3c28.
10
Self-Supervised Pre-Training via Multi-View Graph Information Bottleneck for Molecular Property Prediction.基于多视图图信息瓶颈的自监督预训练用于分子性质预测
IEEE J Biomed Health Inform. 2024 Dec;28(12):7659-7669. doi: 10.1109/JBHI.2024.3422488. Epub 2024 Dec 5.

引用本文的文献

1
ProteinF3S: boosting enzyme function prediction by fusing protein sequence, structure, and surface.ProteinF3S:通过融合蛋白质序列、结构和表面特征增强酶功能预测
Brief Bioinform. 2024 Nov 22;26(1). doi: 10.1093/bib/bbae695.

本文引用的文献

1
ProteinMAE: masked autoencoder for protein surface self-supervised learning.蛋白质 MAE:用于蛋白质表面自监督学习的掩蔽自动编码器。
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad724.
2
FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction.FG-BERT:一种用于性质预测的通用的、基于自监督的官能团分子表示学习框架。
Brief Bioinform. 2023 Sep 22;24(6). doi: 10.1093/bib/bbad398.
3
TransFoxMol: predicting molecular property with focused attention.
TransFoxMol:基于关注机制的分子性质预测
Brief Bioinform. 2023 Sep 20;24(5). doi: 10.1093/bib/bbad306.
4
A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks.一种用于量化跨生化下游任务可转移性的多模态蛋白质表示框架。
Adv Sci (Weinh). 2023 Aug;10(22):e2301223. doi: 10.1002/advs.202301223. Epub 2023 May 30.
5
Extending machine learning beyond interatomic potentials for predicting molecular properties.将机器学习应用于超越原子间势的领域,以预测分子性质。
Nat Rev Chem. 2022 Sep;6(9):653-672. doi: 10.1038/s41570-022-00416-3. Epub 2022 Aug 25.
6
SMICLR: Contrastive Learning on Multiple Molecular Representations for Semisupervised and Unsupervised Representation Learning.SMICLR:基于多种分子表示的对比学习用于半监督和无监督表示学习。
J Chem Inf Model. 2022 Sep 12;62(17):3948-3960. doi: 10.1021/acs.jcim.2c00521. Epub 2022 Aug 31.
7
Self-Supervised Learning of Graph Neural Networks: A Unified Review.图神经网络的自监督学习:统一综述。
IEEE Trans Pattern Anal Mach Intell. 2023 Feb;45(2):2412-2429. doi: 10.1109/TPAMI.2022.3170559. Epub 2023 Jan 6.
8
Graph neural network approaches for drug-target interactions.图神经网络方法在药物-靶标相互作用中的应用。
Curr Opin Struct Biol. 2022 Apr;73:102327. doi: 10.1016/j.sbi.2021.102327. Epub 2022 Jan 21.
9
Review of unsupervised pretraining strategies for molecules representation.分子表示的无监督预训练策略综述。
Brief Funct Genomics. 2021 Sep 11;20(5):323-332. doi: 10.1093/bfgp/elab036.
10
MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction.MG-BERT:利用无监督原子表示学习进行分子性质预测。
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab152.