Chen Junyang, Gong Zhiguo, Wang Wei, Wang Cong, Xu Zhenghua, Lv Jianming, Li Xueliang, Wu Kaishun, Liu Weiwen
IEEE Trans Neural Netw Learn Syst. 2022 Dec;33(12):7079-7090. doi: 10.1109/TNNLS.2021.3084195. Epub 2022 Nov 30.
Network representation learning (NRL) has far-reaching effects on data mining research, showing its importance in many real-world applications. NRL, also known as network embedding, aims at preserving graph structures in a low-dimensional space. These learned representations can be used for subsequent machine learning tasks, such as vertex classification, link prediction, and data visualization. Recently, graph convolutional network (GCN)-based models, e.g., GraphSAGE, have drawn a lot of attention for their success in inductive NRL. When conducting unsupervised learning on large-scale graphs, some of these models employ negative sampling (NS) for optimization, which encourages a target vertex to be close to its neighbors while being far from its negative samples. However, NS draws negative vertices through a random pattern or based on the degrees of vertices. Thus, the generated samples could be either highly relevant or completely unrelated to the target vertex. Moreover, as the training goes, the gradient of NS objective calculated with the inner product of the unrelated negative samples and the target vertex may become zero, which will lead to learning inferior representations. To address these problems, we propose an adversarial training method tailored for unsupervised inductive NRL on large networks. For efficiently keeping track of high-quality negative samples, we design a caching scheme with sampling and updating strategies that has a wide exploration of vertex proximity while considering training costs. Besides, the proposed method is adaptive to various existing GCN-based models without significantly complicating their optimization process. Extensive experiments show that our proposed method can achieve better performance compared with the state-of-the-art models.
网络表示学习(NRL)对数据挖掘研究具有深远影响,在许多实际应用中都显示出其重要性。NRL,也称为网络嵌入,旨在在低维空间中保留图结构。这些学习到的表示可用于后续的机器学习任务,如顶点分类、链接预测和数据可视化。最近,基于图卷积网络(GCN)的模型,如GraphSAGE,因其在归纳式NRL中的成功而备受关注。在对大规模图进行无监督学习时,其中一些模型采用负采样(NS)进行优化,这促使目标顶点靠近其邻居,同时远离其负样本。然而,NS通过随机模式或基于顶点的度来抽取负顶点。因此,生成的样本可能与目标顶点高度相关,也可能完全不相关。此外,随着训练的进行,用不相关的负样本与目标顶点的内积计算得到的NS目标的梯度可能变为零,这将导致学习到较差的表示。为了解决这些问题,我们提出了一种针对大型网络上无监督归纳式NRL量身定制的对抗训练方法。为了有效地跟踪高质量的负样本,我们设计了一种带有采样和更新策略的缓存方案,该方案在考虑训练成本的同时,对顶点接近度进行了广泛探索。此外,所提出的方法适用于各种现有的基于GCN的模型,而不会使其优化过程显著复杂化。大量实验表明,与现有最先进的模型相比,我们提出的方法能够取得更好的性能。