Pham Phu
Faculty of Information Technology, HUTECH University, 700000, Ho Chi Minh City, Vietnam.
Mol Inform. 2025 Jan;44(1):e202400252. doi: 10.1002/minf.202400252.
In recent times, graph representation learning has been becoming a hot research topic which has attracted a lot of attention from researchers. Graph embeddings have diverse applications across fields such as information and social network analysis, bioinformatics and cheminformatics, natural language processing (NLP), and recommendation systems. Among the advanced deep learning (DL) based architectures used in graph representation learning, graph neural networks (GNNs) have emerged as the dominant and highly effective framework. The recent GNN-based methods have demonstrated state-of-the-art performance on complex supervised and unsupervised tasks at both the node and graph levels. In recent years, to enhance multi-view and structured graph representations, contrastive learning-based techniques have been developed, introducing models known as graph contrastive learning (GCL) models. These GCL approaches leverage unsupervised contrastive methods to capture multi-view graph representations by comparing node and graph embeddings, yielding significant improvements in both graph-level representations and task-specific applications, such as molecular embedding and classification. However, as most GCL techniques are primarily designed to focus on the explicit graph structure through GNN-based encoders, they often overlook critical topological insights that could be provided through topological data analysis (TDA). Given the promising research indicating that topological features can greatly benefit various graph learning tasks, we propose a novel topology-enhanced, multi-view graph contrastive learning model called TMGCL. Our TMGCL model is designed to capture and utilize both comprehensive multi-scale topological and global structural information from graphs. This enhanced representation capability positions TMGCL to directly support a range of applications, such as molecular classification, with improved accuracy and robustness. Extensive experiments within two real-world datasets proved the effectiveness and outperformance of our proposed TMGCL in comparing with state-of-the-art GNN/GCL-based baselines.
近年来,图表示学习已成为一个热门研究课题,吸引了研究人员的广泛关注。图嵌入在信息和社交网络分析、生物信息学和化学信息学、自然语言处理(NLP)以及推荐系统等领域有着广泛的应用。在图表示学习中使用的基于深度学习(DL)的先进架构中,图神经网络(GNN)已成为主导且高效的框架。最近基于GNN的方法在节点和图级别上的复杂监督和无监督任务中都展示了领先的性能。近年来,为了增强多视图和结构化图表示,基于对比学习的技术得到了发展,引入了被称为图对比学习(GCL)模型的模型。这些GCL方法利用无监督对比方法,通过比较节点和图嵌入来捕获多视图图表示,在图级表示和特定任务应用(如分子嵌入和分类)方面都取得了显著改进。然而,由于大多数GCL技术主要旨在通过基于GNN的编码器关注显式图结构,它们往往忽略了通过拓扑数据分析(TDA)可以提供的关键拓扑见解。鉴于有前景的研究表明拓扑特征可以极大地有益于各种图学习任务,我们提出了一种名为TMGCL的新型拓扑增强多视图图对比学习模型。我们的TMGCL模型旨在捕获和利用图中全面的多尺度拓扑和全局结构信息。这种增强的表示能力使TMGCL能够直接支持一系列应用,如分子分类,具有更高的准确性和鲁棒性。在两个真实世界数据集上进行的广泛实验证明了我们提出的TMGCL与基于GNN/GCL的现有基线相比的有效性和优越性。