Ao Jia, Huang Xiangsheng, Dai Wei, Ji Cancan
Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China.
School of Artificial Intelligence, University of Chinese Academy of Sciences, 19 Yuquan Road, Beijing, 100049, China.
J Comput Aided Mol Des. 2025 Sep 13;39(1):77. doi: 10.1007/s10822-025-00658-5.
Due to the complexity of molecules, molecular learning requires a large amount of molecular data. However, labeled data is typically limited, making self-supervised pretraining methods essential. Despite this, current pretraining methods often fail to sufficiently focus on both local and global molecular information. In this study, we propose a multi-modality self-supervised learning framework that simultaneously captures local and global information. Specifically, we encode SMILES sequences and molecular graphs separately and use a unified fusion approach to strengthen the interaction between the two modalities. Moreover, in the molecular graph encoding, we independently capture global and local information, and enhance the attention to bond features through information fusion. Additionally, we introduce the FA-FFN module to aggregate periodic features of the molecule. Experimental results show that MoleTGL exhibits superior performance compared to existing methods on seven classification tasks and six regression tasks related to molecular property prediction, and ablation studies confirm the effectiveness of local and global feature fusion and the superiority of the methods for acquiring local and global information.
由于分子的复杂性,分子学习需要大量的分子数据。然而,标记数据通常是有限的,这使得自监督预训练方法至关重要。尽管如此,当前的预训练方法往往未能充分关注局部和全局分子信息。在本研究中,我们提出了一种多模态自监督学习框架,该框架同时捕获局部和全局信息。具体而言,我们分别对SMILES序列和分子图进行编码,并使用统一的融合方法来加强两种模态之间的相互作用。此外,在分子图编码中,我们独立捕获全局和局部信息,并通过信息融合增强对键特征的关注。此外,我们引入了FA-FFN模块来聚合分子的周期性特征。实验结果表明,在与分子性质预测相关的七个分类任务和六个回归任务上,MoleTGL与现有方法相比表现出卓越的性能,消融研究证实了局部和全局特征融合的有效性以及获取局部和全局信息方法的优越性。