Makarov Ilya, Kiselev Dmitrii, Nikitinsky Nikita, Subelj Lovro
HSE University, Moscow, Russia.
Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.
PeerJ Comput Sci. 2021 Feb 4;7:e357. doi: 10.7717/peerj-cs.357. eCollection 2021.
Dealing with relational data always required significant computational resources, domain expertise and task-dependent feature engineering to incorporate structural information into a predictive model. Nowadays, a family of automated graph feature engineering techniques has been proposed in different streams of literature. So-called graph embeddings provide a powerful tool to construct vectorized feature spaces for graphs and their components, such as nodes, edges and subgraphs under preserving inner graph properties. Using the constructed feature spaces, many machine learning problems on graphs can be solved via standard frameworks suitable for vectorized feature representation. Our survey aims to describe the core concepts of graph embeddings and provide several taxonomies for their description. First, we start with the methodological approach and extract three types of graph embedding models based on matrix factorization, random-walks and deep learning approaches. Next, we describe how different types of networks impact the ability of models to incorporate structural and attributed data into a unified embedding. Going further, we perform a thorough evaluation of graph embedding applications to machine learning problems on graphs, among which are node classification, link prediction, clustering, visualization, compression, and a family of the whole graph embedding algorithms suitable for graph classification, similarity and alignment problems. Finally, we overview the existing applications of graph embeddings to computer science domains, formulate open problems and provide experiment results, explaining how different networks properties result in graph embeddings quality in the four classic machine learning problems on graphs, such as node classification, link prediction, clustering and graph visualization. As a result, our survey covers a new rapidly growing field of network feature engineering, presents an in-depth analysis of models based on network types, and overviews a wide range of applications to machine learning problems on graphs.
处理关系型数据总是需要大量的计算资源、领域专业知识以及依赖任务的特征工程,以便将结构信息整合到预测模型中。如今,不同文献流派中已经提出了一系列自动化的图特征工程技术。所谓的图嵌入为构建图及其组件(如节点、边和子图)的矢量化特征空间提供了一个强大的工具,同时保留图的内部属性。利用构建好的特征空间,许多关于图的机器学习问题可以通过适用于矢量化特征表示的标准框架来解决。我们的综述旨在描述图嵌入的核心概念,并为其描述提供几种分类法。首先,我们从方法论入手,基于矩阵分解、随机游走和深度学习方法提取出三种类型的图嵌入模型。接下来,我们描述不同类型的网络如何影响模型将结构数据和属性数据整合到统一嵌入中的能力。进一步地,我们对图嵌入在图的机器学习问题中的应用进行了全面评估,其中包括节点分类、链接预测、聚类、可视化、压缩,以及适用于图分类、相似性和对齐问题的一系列全图嵌入算法。最后,我们概述了图嵌入在计算机科学领域的现有应用,提出了开放性问题并提供了实验结果,解释了不同的网络属性如何在图的四个经典机器学习问题(如节点分类、链接预测、聚类和图可视化)中导致图嵌入的质量差异。因此,我们的综述涵盖了一个新的快速发展的网络特征工程领域,对基于网络类型的模型进行了深入分析,并概述了在图的机器学习问题上的广泛应用。