Rossi Ryan A, Zhou Rong, Ahmed Nesreen K
IEEE Trans Neural Netw Learn Syst. 2019 Jan;30(1):44-57. doi: 10.1109/TNNLS.2018.2826529. Epub 2018 May 18.
Graphlets are induced subgraphs of a large network and are important for understanding and modeling complex networks. Despite their practical importance, graphlets have been severely limited to applications and domains with relatively small graphs. Most previous work has focused on exact algorithms; however, it is often too expensive to compute graphlets exactly in massive networks with billions of edges, and finding an approximate count is usually sufficient for many applications. In this paper, we propose an unbiased graphlet estimation framework that is: (a) fast with large speedups compared to the state of the art; (b) parallel with nearly linear speedups; (c) accurate with less than 1% relative error; (d) scalable and space efficient for massive networks with billions of edges; and (e) effective for a variety of real-world settings as well as estimating global and local graphlet statistics (e.g., counts). On 300 networks from 20 domains, we obtain <1% relative error for all graphlets. This is vastly more accurate than the existing methods while using less data. Moreover, it takes a few seconds on billion edge graphs (as opposed to days/weeks). These are by far the largest graphlet computations to date.
图小子是大型网络的诱导子图,对于理解和建模复杂网络非常重要。尽管它们具有实际重要性,但图小子在应用和领域方面一直严重局限于相对较小的图。以前的大多数工作都集中在精确算法上;然而,在具有数十亿条边的大规模网络中精确计算图小子通常成本过高,而且对于许多应用来说,找到一个近似计数通常就足够了。在本文中,我们提出了一个无偏图小子估计框架,该框架具有以下特点:(a) 与现有技术相比速度快且加速显著;(b) 具有近线性加速的并行性;(c) 精度高,相对误差小于1%;(d) 对于具有数十亿条边的大规模网络具有可扩展性且空间高效;(e) 对各种实际场景有效,并且可用于估计全局和局部图小子统计量(例如计数)。在来自20个领域的300个网络上,我们对所有图小子的相对误差都小于1%。这比现有方法要精确得多,同时使用的数据更少。此外,在具有数十亿条边的图上只需几秒钟(而不是几天/几周)。这些是迄今为止最大规模的图小子计算。