School of Information Science and Engineering, Lanzhou University, Lanzhou, 730000, China.
Interdiscip Sci. 2024 Sep;16(3):741-754. doi: 10.1007/s12539-024-00632-z. Epub 2024 May 6.
Molecular representation learning can preserve meaningful molecular structures as embedding vectors, which is a necessary prerequisite for molecular property prediction. Yet, learning how to accurately represent molecules remains challenging. Previous approaches to learning molecular representations in an end-to-end manner potentially suffered information loss while neglecting the utilization of molecular generative representations. To obtain rich molecular feature information, the pre-training molecular representation model utilized different molecular representations to reduce information loss caused by a single molecular representation. Therefore, we provide the MVGC, a unique multi-view generative contrastive learning pre-training model. Our pre-training framework specifically acquires knowledge of three fundamental feature representations of molecules and effectively integrates them to predict molecular properties on benchmark datasets. Comprehensive experiments on seven classification tasks and three regression tasks demonstrate that our proposed MVGC model surpasses the majority of state-of-the-art approaches. Moreover, we explore the potential of the MVGC model to learn the representation of molecules with chemical significance.
分子表示学习可以将有意义的分子结构保留为嵌入向量,这是进行分子性质预测的必要前提。然而,学习如何准确地表示分子仍然具有挑战性。以前端到端的方式学习分子表示的方法可能会在忽略利用分子生成表示的情况下导致信息丢失。为了获得丰富的分子特征信息,预训练分子表示模型利用不同的分子表示来减少由单一分子表示引起的信息丢失。因此,我们提供了 MVGC,这是一种独特的多视图生成对比学习预训练模型。我们的预训练框架专门获取分子的三种基本特征表示的知识,并有效地将它们整合起来,以在基准数据集上预测分子性质。在七个分类任务和三个回归任务上的综合实验表明,我们提出的 MVGC 模型优于大多数最先进的方法。此外,我们还探索了 MVGC 模型学习具有化学意义的分子表示的潜力。