Zu Shuaishuai, Li Li, Shen Jun, Tang Weitao
School of Computer and Information Science, Southwest University, China.
School of Computer and Information Science, Southwest University, China.
Neural Netw. 2025 Jul;187:107367. doi: 10.1016/j.neunet.2025.107367. Epub 2025 Mar 13.
Graph embedding aims to embed the information of graph data into low-dimensional representation space. Prior methods generally suffer from an imbalance of preserving structural information and node features due to their pre-defined inductive biases, leading to unsatisfactory generalization performance. In order to preserve the maximal information, graph contrastive learning (GCL) has become a prominent technique for learning discriminative embeddings. However, in contrast with graph-level embeddings, existing GCL methods generally learn less discriminative node embeddings in a self-supervised way. In this paper, we ascribe above problem to two challenges: (1) graph data augmentations, which are designed for generating contrastive representations, hurt the original semantic information for nodes. (2) the nodes within the same cluster are selected as negative samples. To alleviate these challenges, we propose Contrastive Graph Auto-Encoder (CGAE) and Contrastive Variational Graph Auto-Encoder (CVGAE). Specifically, we first propose two distribution-dependent regularizations to guide the paralleled encoders to generate contrastive representations following similar distribution, followed by theoretical derivations to verify the equivalence of the above regularizations. Then, we utilize truncated triplet loss, which only selects top-k nodes as negative samples, to avoid over-separate nodes affiliated to the same cluster. Furthermore, we give theoretical analysis of the effectiveness of our models. Experiments on several real-world datasets show that our models advanced performance over all baselines in link prediction, node clustering, and graph visualization tasks.
图嵌入旨在将图数据的信息嵌入到低维表示空间中。由于预定义的归纳偏差,先前的方法通常在保留结构信息和节点特征方面存在不平衡,导致泛化性能不尽人意。为了保留最大信息,图对比学习(GCL)已成为学习判别性嵌入的一项突出技术。然而,与图级嵌入相比,现有的GCL方法通常以自监督的方式学习判别性较差的节点嵌入。在本文中,我们将上述问题归因于两个挑战:(1)为生成对比表示而设计的图数据增强会损害节点的原始语义信息。(2)同一聚类中的节点被选作负样本。为了缓解这些挑战,我们提出了对比图自动编码器(CGAE)和对比变分图自动编码器(CVGAE)。具体来说,我们首先提出两个依赖于分布的正则化方法,以引导并行编码器按照相似的分布生成对比表示,随后进行理论推导以验证上述正则化的等价性。然后,我们使用截断三元组损失,该损失仅选择前k个节点作为负样本,以避免过度分离属于同一聚类的节点。此外,我们对模型的有效性进行了理论分析。在几个真实世界数据集上的实验表明,我们的模型在链接预测、节点聚类和图可视化任务中比所有基线都具有更优的性能。