基于邻近度的网络嵌入压缩

Proximity-Based Compression for Network Embedding.

作者信息

Islam Muhammad Ifte, Tanvir Farhan, Johnson Ginger, Akbas Esra, Aktas Mehmet Emin

机构信息

Department of Computer Science, Oklahoma State University, Stillwater, OK, United States.

Department of Computer Science, University of Tulsa, Tulsa, OK, United States.

出版信息

Front Big Data. 2021 Jan 26;3:608043. doi: 10.3389/fdata.2020.608043. eCollection 2020.

DOI:10.3389/fdata.2020.608043

PMID:33693427

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7931879/

Abstract

Network embedding that encodes structural information of graphs into a low-dimensional vector space has been proven to be essential for network analysis applications, including node classification and community detection. Although recent methods show promising performance for various applications, graph embedding still has some challenges; either the huge size of graphs may hinder a direct application of the existing network embedding method to them, or they suffer compromises in accuracy from locality and noise. In this paper, we propose a novel etwork mbedding method, NECL, to generate embedding more efficiently or effectively. Our goal is to answer the following two questions: 1) Does the network ompression significantly boost earning? 2) Does network compression improve the quality of the representation? For these goals, first, we propose a novel graph compression method based on the neighborhood similarity that compresses the input graph to a smaller graph with incorporating local proximity of its vertices into super-nodes; second, we employ the compressed graph for network embedding instead of the original large graph to bring down the embedding cost and also to capture the global structure of the original graph; third, we refine the embeddings from the compressed graph to the original graph. NECL is a general meta-strategy that improves the efficiency and effectiveness of many state-of-the-art graph embedding algorithms based on node proximity, including DeepWalk, Node2vec, and LINE. Extensive experiments validate the efficiency and effectiveness of our method, which decreases embedding time and improves classification accuracy as evaluated on single and multi-label classification tasks with large real-world graphs.

摘要

将图的结构信息编码到低维向量空间的网络嵌入已被证明对包括节点分类和社区检测在内的网络分析应用至关重要。尽管最近的方法在各种应用中表现出了良好的性能，但图嵌入仍然存在一些挑战；要么图的巨大规模可能会阻碍现有网络嵌入方法对它们的直接应用，要么它们会因局部性和噪声而在准确性上有所妥协。在本文中，我们提出了一种新颖的网络嵌入方法NECL，以更高效或有效地生成嵌入。我们的目标是回答以下两个问题：1）网络压缩是否能显著提升学习效果？2）网络压缩是否能提高表示的质量？为了实现这些目标，首先，我们提出了一种基于邻域相似性的新颖图压缩方法，该方法将输入图压缩为一个更小的图，通过将其顶点的局部接近性纳入超节点；其次，我们使用压缩后的图进行网络嵌入，而不是原始的大图，以降低嵌入成本并捕捉原始图的全局结构；第三，我们将从压缩图得到的嵌入细化到原始图。NECL是一种通用的元策略，它提高了许多基于节点接近性的现有图嵌入算法的效率和有效性，包括DeepWalk、Node2vec和LINE。大量实验验证了我们方法的效率和有效性，在使用大型真实世界图的单标签和多标签分类任务中，该方法减少了嵌入时间并提高了分类准确率。