Cao Xiaochun, Wang Xiao, Jin Di, Guo Xiaojie, Tang Xianchao
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China; State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China.
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
PLoS One. 2015 Mar 30;10(3):e0119171. doi: 10.1371/journal.pone.0119171. eCollection 2015.
Community detection is a fundamental problem in the analysis of complex networks. Recently, many researchers have concentrated on the detection of overlapping communities, where a vertex may belong to more than one community. However, most current methods require the number (or the size) of the communities as a priori information, which is usually unavailable in real-world networks. Thus, a practical algorithm should not only find the overlapping community structure, but also automatically determine the number of communities. Furthermore, it is preferable if this method is able to reveal the hierarchical structure of networks as well. In this work, we firstly propose a generative model that employs a nonnegative matrix factorization (NMF) formulization with a l(2,1) norm regularization term, balanced by a resolution parameter. The NMF has the nature that provides overlapping community structure by assigning soft membership variables to each vertex; the l(2,1) regularization term is a technique of group sparsity which can automatically determine the number of communities by penalizing too many nonempty communities; and hence the resolution parameter enables us to explore the hierarchical structure of networks. Thereafter, we derive the multiplicative update rule to learn the model parameters, and offer the proof of its correctness. Finally, we test our approach on a variety of synthetic and real-world networks, and compare it with some state-of-the-art algorithms. The results validate the superior performance of our new method.
社区检测是复杂网络分析中的一个基本问题。最近,许多研究人员专注于重叠社区的检测,其中一个顶点可能属于多个社区。然而,目前大多数方法都需要将社区的数量(或规模)作为先验信息,而这在现实世界的网络中通常是不可用的。因此,一个实用的算法不仅应该找到重叠社区结构,还应该自动确定社区的数量。此外,如果该方法还能够揭示网络的层次结构则更好。在这项工作中,我们首先提出一种生成模型,该模型采用带有l(2,1)范数正则化项的非负矩阵分解(NMF)公式化,并由一个分辨率参数进行平衡。NMF的本质是通过为每个顶点分配软隶属度变量来提供重叠社区结构;l(2,1)正则化项是一种组稀疏技术,它可以通过惩罚过多的非空社区来自动确定社区的数量;因此,分辨率参数使我们能够探索网络的层次结构。此后,我们推导了用于学习模型参数的乘法更新规则,并给出了其正确性的证明。最后,我们在各种合成网络和现实世界网络上测试了我们的方法,并将其与一些最先进的算法进行了比较。结果验证了我们新方法的优越性能。