Department of Computer Science and Technology, Xiamen University, Xiamen 361005, China.
National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen 361005, China.
Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae164.
Molecular representation learning plays an indispensable role in crucial tasks such as property prediction and drug design. Despite the notable achievements of molecular pre-training models, current methods often fail to capture both the structural and feature semantics of molecular graphs. Moreover, while graph contrastive learning has unveiled new prospects, existing augmentation techniques often struggle to retain their core semantics. To overcome these limitations, we propose a gradient-compensated encoder parameter perturbation approach, ensuring efficient and stable feature augmentation. By merging enhancement strategies grounded in attribute masking and parameter perturbation, we introduce MoleMCL, a new MOLEcular pre-training model based on multi-level contrastive learning.
Experimental results demonstrate that MoleMCL adeptly dissects the structure and feature semantics of molecular graphs, surpassing current state-of-the-art models in molecular prediction tasks, paving a novel avenue for molecular modeling.
The code and data underlying this work are available in GitHub at https://github.com/BioSequenceAnalysis/MoleMCL.
分子表示学习在属性预测和药物设计等关键任务中起着不可或缺的作用。尽管分子预训练模型取得了显著的成就,但目前的方法往往无法同时捕捉分子图的结构和特征语义。此外,尽管图对比学习揭示了新的前景,但现有的增强技术往往难以保留其核心语义。为了克服这些限制,我们提出了一种梯度补偿编码器参数扰动方法,确保高效稳定的特征增强。通过合并基于属性掩蔽和参数扰动的增强策略,我们引入了 MoleMCL,这是一种基于多层次对比学习的新型 MOLEcular 预训练模型。
实验结果表明,MoleMCL 能够巧妙地剖析分子图的结构和特征语义,在分子预测任务中超越了当前最先进的模型,为分子建模开辟了一条新途径。
这项工作的代码和数据可在 GitHub 上获得,网址为 https://github.com/BioSequenceAnalysis/MoleMCL。