Du Rundong, Kuang Da, Drake Barry, Park Haesun
School of Mathematics, Georgia Institute of Technology, 686 Cherry Street, Atlanta, GA 30332-0160 USA.
Department of Mathematics, University of California, Los Angeles, 520 Portola Plaza, Los Angeles, CA 90095-1555 USA.
Comput Soc Netw. 2017;4(1):7. doi: 10.1186/s40649-017-0043-5. Epub 2017 Sep 8.
Community discovery is an important task for revealing structures in large networks. The massive size of contemporary social networks poses a tremendous challenge to the scalability of traditional graph clustering algorithms and the evaluation of discovered communities.
We propose a divide-and-conquer strategy to discover hierarchical community structure, nonoverlapping within each level. Our algorithm is based on the highly efficient rank-2 symmetric nonnegative matrix factorization. We solve several implementation challenges to boost its efficiency on modern computer architectures, specifically for very sparse adjacency matrices that represent a wide range of social networks.
Empirical results have shown that our algorithm has competitive overall efficiency and leading performance in minimizing the average normalized cut, and that the nonoverlapping communities found by our algorithm recover the ground-truth communities better than state-of-the-art algorithms for overlapping community detection. In addition, we present a new dataset of the DBLP computer science bibliography network with richer meta-data and verifiable ground-truth knowledge, which can foster future research in community finding and interpretation of communities in large networks.
社区发现是揭示大型网络结构的一项重要任务。当代社交网络的庞大规模对传统图聚类算法的可扩展性以及所发现社区的评估构成了巨大挑战。
我们提出一种分治策略来发现层次化社区结构,每个层次内不重叠。我们的算法基于高效的二阶对称非负矩阵分解。我们解决了几个实现方面的挑战,以提高其在现代计算机架构上的效率,特别是对于表示各种社交网络的非常稀疏的邻接矩阵。
实证结果表明,我们的算法具有有竞争力的整体效率,并且在最小化平均归一化割方面具有领先性能,而且我们的算法所发现的不重叠社区比用于重叠社区检测的现有算法能更好地恢复真实社区。此外,我们提出了一个具有更丰富元数据和可验证真实知识的DBLP计算机科学文献网络新数据集,这可以促进未来在社区发现和大型网络中社区解释方面的研究。