Leskovec Jure, Sosič Rok
Stanford University.
ACM Trans Intell Syst Technol. 2016 Oct;8(1). doi: 10.1145/2898361. Epub 2016 Oct 3.
Large networks are becoming a widely used abstraction for studying complex systems in a broad set of disciplines, ranging from social network analysis to molecular biology and neuroscience. Despite an increasing need to analyze and manipulate large networks, only a limited number of tools are available for this task. Here, we describe Stanford Network Analysis Platform (SNAP), a general-purpose, high-performance system that provides easy to use, high-level operations for analysis and manipulation of large networks. We present SNAP functionality, describe its implementational details, and give performance benchmarks. SNAP has been developed for single big-memory machines and it balances the trade-off between maximum performance, compact in-memory graph representation, and the ability to handle dynamic graphs where nodes and edges are being added or removed over time. SNAP can process massive networks with hundreds of millions of nodes and billions of edges. SNAP offers over 140 different graph algorithms that can efficiently manipulate large graphs, calculate structural properties, generate regular and random graphs, and handle attributes and meta-data on nodes and edges. Besides being able to handle large graphs, an additional strength of SNAP is that networks and their attributes are fully dynamic, they can be modified during the computation at low cost. SNAP is provided as an open source library in C++ as well as a module in Python. We also describe the Stanford Large Network Dataset, a set of social and information real-world networks and datasets, which we make publicly available. The collection is a complementary resource to our SNAP software and is widely used for development and benchmarking of graph analytics algorithms.
大型网络正成为一种广泛使用的抽象概念,用于研究众多学科中的复杂系统,涵盖从社会网络分析到分子生物学和神经科学等领域。尽管对分析和操纵大型网络的需求日益增加,但用于此任务的工具却有限。在此,我们介绍斯坦福网络分析平台(SNAP),这是一个通用的高性能系统,为大型网络的分析和操纵提供易于使用的高级操作。我们展示SNAP的功能,描述其实现细节,并给出性能基准测试。SNAP是为单台大内存机器开发的,它在最大性能、紧凑的内存图形表示以及处理随时间添加或删除节点和边的动态图形的能力之间进行权衡。SNAP可以处理包含数亿节点和数十亿条边的大规模网络。SNAP提供了140多种不同的图形算法,可有效操纵大型图形、计算结构属性、生成规则和随机图形,以及处理节点和边上的属性和元数据。除了能够处理大型图形外,SNAP的另一个优势是网络及其属性是完全动态的,它们可以在计算过程中以低成本进行修改。SNAP以C++开源库以及Python模块的形式提供。我们还描述了斯坦福大型网络数据集,这是一组社会和信息领域的真实世界网络及数据集,我们将其公开提供。该集合是我们SNAP软件的补充资源,被广泛用于图形分析算法的开发和基准测试。