Department of Applied Mathematics, University of Colorado at Boulder, Boulder, CO, United States of America.
PLoS One. 2020 Feb 12;15(2):e0228728. doi: 10.1371/journal.pone.0228728. eCollection 2020.
Comparison of graph structure is a ubiquitous task in data analysis and machine learning, with diverse applications in fields such as neuroscience, cyber security, social network analysis, and bioinformatics, among others. Discovery and comparison of structures such as modular communities, rich clubs, hubs, and trees yield insight into the generative mechanisms and functional properties of the graph. Often, two graphs are compared via a pairwise distance measure, with a small distance indicating structural similarity and vice versa. Common choices include spectral distances and distances based on node affinities. However, there has of yet been no comparative study of the efficacy of these distance measures in discerning between common graph topologies at different structural scales. In this work, we compare commonly used graph metrics and distance measures, and demonstrate their ability to discern between common topological features found in both random graph models and real world networks. We put forward a multi-scale picture of graph structure wherein we study the effect of global and local structures on changes in distance measures. We make recommendations on the applicability of different distance measures to the analysis of empirical graph data based on this multi-scale view. Finally, we introduce the Python library NetComp that implements the graph distances used in this work.
图结构的比较是数据分析和机器学习中的一项普遍任务,在神经科学、网络安全、社交网络分析和生物信息学等领域有多种应用。发现和比较模块社区、富连社区、枢纽和树等结构,可以深入了解图的生成机制和功能特性。通常,通过成对距离度量来比较两个图,距离小表示结构相似,反之亦然。常见的选择包括谱距离和基于节点亲和力的距离。然而,到目前为止,还没有对这些距离度量在辨别不同结构尺度下常见图拓扑的功效进行比较研究。在这项工作中,我们比较了常用的图度量和距离度量,并展示了它们在辨别随机图模型和真实世界网络中常见拓扑特征方面的能力。我们提出了一个图结构的多尺度图像,在这个图像中我们研究全局和局部结构对距离度量变化的影响。基于这种多尺度视图,我们对不同距离度量在分析经验图数据中的适用性提出了建议。最后,我们引入了 Python 库 NetComp,它实现了这项工作中使用的图距离。