Department of Computer Science, Princeton University, Princeton, NJ 08540, USA.
Proc Natl Acad Sci U S A. 2013 Sep 3;110(36):14534-9. doi: 10.1073/pnas.1221839110. Epub 2013 Aug 15.
Detecting overlapping communities is essential to analyzing and exploring natural networks such as social networks, biological networks, and citation networks. However, most existing approaches do not scale to the size of networks that we regularly observe in the real world. In this paper, we develop a scalable approach to community detection that discovers overlapping communities in massive real-world networks. Our approach is based on a Bayesian model of networks that allows nodes to participate in multiple communities, and a corresponding algorithm that naturally interleaves subsampling from the network and updating an estimate of its communities. We demonstrate how we can discover the hidden community structure of several real-world networks, including 3.7 million US patents, 575,000 physics articles from the arXiv preprint server, and 875,000 connected Web pages from the Internet. Furthermore, we demonstrate on large simulated networks that our algorithm accurately discovers the true community structure. This paper opens the door to using sophisticated statistical models to analyze massive networks.
检测重叠社区对于分析和探索自然网络(如社交网络、生物网络和引文网络)至关重要。然而,大多数现有的方法无法扩展到我们在现实世界中经常观察到的网络规模。在本文中,我们开发了一种可扩展的方法来检测社区,以发现大规模真实网络中的重叠社区。我们的方法基于网络的贝叶斯模型,该模型允许节点参与多个社区,以及相应的算法,该算法自然地交错从网络中进行抽样和更新对其社区的估计。我们展示了如何发现包括 370 万项美国专利、arXiv 预印本服务器上的 57.5 万篇物理学文章以及互联网上的 87.5 万互联网页在内的多个真实网络的隐藏社区结构。此外,我们在大型模拟网络上证明了我们的算法可以准确地发现真实的社区结构。本文为使用复杂的统计模型分析大规模网络开辟了道路。