School of Computer Science and Engineering, Sun Yat-sen University, Waihuan East Street, Guangzhou 510006, China.
Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae474.
In this paper, we propose DGCL, a dual-graph neural networks (GNNs)-based contrastive learning (CL) integrated with mixed molecular fingerprints (MFPs) for molecular property prediction. The DGCL-MFP method contains two stages. In the first pretraining stage, we utilize two different GNNs as encoders to construct CL, rather than using the method of generating enhanced graphs as before. Precisely, DGCL aggregates and enhances features of the same molecule by the Graph Isomorphism Network and the Graph Attention Network, with representations extracted from the same molecule serving as positive samples, and others marked as negative ones. In the downstream tasks training stage, features extracted from the two above pretrained graph networks and the meticulously selected MFPs are concated together to predict molecular properties. Our experiments show that DGCL enhances the performance of existing GNNs by achieving or surpassing the state-of-the-art self-supervised learning models on multiple benchmark datasets. Specifically, DGCL increases the average performance of classification tasks by 3.73$%$ and improves the performance of regression task Lipo by 0.126. Through ablation studies, we validate the impact of network fusion strategies and MFPs on model performance. In addition, DGCL's predictive performance is further enhanced by weighting different molecular features based on the Extended Connectivity Fingerprint. The code and datasets of DGCL will be made publicly available.
本文提出了一种基于双图神经网络(GNN)的对比学习(CL)与混合分子指纹(MFPs)相结合的方法 DGCL,用于分子性质预测。DGCL-MFP 方法包含两个阶段。在第一阶段的预训练中,我们使用两个不同的 GNN 作为编码器来构建 CL,而不是像以前那样使用生成增强图的方法。具体来说,DGCL 通过图同构网络和图注意力网络聚合和增强同一分子的特征,来自同一分子的表示作为正样本,其他的则标记为负样本。在下游任务的训练阶段,从上述两个预训练图网络和精心选择的 MFPs 中提取的特征拼接在一起,以预测分子性质。我们的实验表明,DGCL 通过在多个基准数据集上达到或超过最先进的自监督学习模型,提高了现有 GNN 的性能。具体来说,DGCL 使分类任务的平均性能提高了 3.73%,使回归任务 Lipo 的性能提高了 0.126。通过消融研究,我们验证了网络融合策略和 MFPs 对模型性能的影响。此外,根据扩展连通指纹为不同的分子特征加权,进一步提高了 DGCL 的预测性能。DGCL 的代码和数据集将公开。