Yang Liang, Ge Meng, Jin Di, He Dongxiao, Fu Huazhu, Wang Jing, Cao Xiaochun
School of Information Engineering, Tianjin University of Commerce, Tianjin, China.
State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China.
PLoS One. 2017 Jul 5;12(7):e0178029. doi: 10.1371/journal.pone.0178029. eCollection 2017.
Due to the demand for performance improvement and the existence of prior information, semi-supervised community detection with pairwise constraints becomes a hot topic. Most existing methods have been successfully encoding the must-link constraints, but neglect the opposite ones, i.e., the cannot-link constraints, which can force the exclusion between nodes. In this paper, we are interested in understanding the role of cannot-link constraints and effectively encoding pairwise constraints. Towards these goals, we define an integral generative process jointly considering the network topology, must-link and cannot-link constraints. We propose to characterize this process as a Multi-variance Mixed Gaussian Generative (MMGG) Model to address diverse degrees of confidences that exist in network topology and pairwise constraints and formulate it as a weighted nonnegative matrix factorization problem. The experiments on artificial and real-world networks not only illustrate the superiority of our proposed MMGG, but also, most importantly, reveal the roles of pairwise constraints. That is, though the must-link is more important than cannot-link when either of them is available, both must-link and cannot-link are equally important when both of them are available. To the best of our knowledge, this is the first work on discovering and exploring the importance of cannot-link constraints in semi-supervised community detection.
由于对性能提升的需求以及先验信息的存在,带成对约束的半监督社区检测成为一个热门话题。大多数现有方法已成功对必须连接约束进行编码,但忽略了相反的约束,即不能连接约束,它可强制节点之间相互排斥。在本文中,我们关注理解不能连接约束的作用并有效编码成对约束。为实现这些目标,我们定义了一个联合考虑网络拓扑、必须连接和不能连接约束的积分生成过程。我们建议将此过程表征为多变量混合高斯生成(MMGG)模型,以处理网络拓扑和成对约束中存在的不同程度的置信度,并将其表述为加权非负矩阵分解问题。在人工网络和真实网络上的实验不仅说明了我们提出的MMGG的优越性,而且最重要的是揭示了成对约束的作用。也就是说,当必须连接和不能连接约束其中之一可用时,必须连接比不能连接更重要,但当两者都可用时,必须连接和不能连接同样重要。据我们所知,这是第一项发现并探索不能连接约束在半监督社区检测中的重要性的工作。