Yang Liang, Jin Di, Wang Xiao, Cao Xiaochun
1] State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China [2] School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China.
School of Computer Science and Technology, Tianjin University, Tianjin 300072, China.
Sci Rep. 2015 Mar 12;5:9039. doi: 10.1038/srep09039.
Several semi-supervised community detection algorithms have been proposed recently to improve the performance of traditional topology-based methods. However, most of them focus on how to integrate supervised information with topology information; few of them pay attention to which information is critical for performance improvement. This leads to large amounts of demand for supervised information, which is expensive or difficult to obtain in most fields. For this problem we propose an active link selection framework, that is we actively select the most uncertain and informative links for human labeling for the efficient utilization of the supervised information. We also disconnect the most likely inter-community edges to further improve the efficiency. Our main idea is that, by connecting uncertain nodes to their community hubs and disconnecting the inter-community edges, one can sharpen the block structure of adjacency matrix more efficiently than randomly labeling links as the existing methods did. Experiments on both synthetic and real networks demonstrate that our new approach significantly outperforms the existing methods in terms of the efficiency of using supervised information. It needs ~13% of the supervised information to achieve a performance similar to that of the original semi-supervised approaches.
最近提出了几种半监督社区检测算法,以提高传统基于拓扑的方法的性能。然而,它们中的大多数都专注于如何将监督信息与拓扑信息集成;很少有人关注哪些信息对性能提升至关重要。这导致对监督信息的大量需求,而在大多数领域中,监督信息成本高昂或难以获取。针对这个问题,我们提出了一个主动链接选择框架,即我们主动选择最不确定且信息丰富的链接进行人工标注,以有效利用监督信息。我们还断开最可能的社区间边,以进一步提高效率。我们的主要思想是,通过将不确定节点连接到其社区枢纽并断开社区间边,与现有方法随机标注链接相比,人们可以更有效地锐化邻接矩阵的块结构。在合成网络和真实网络上的实验表明,我们的新方法在使用监督信息的效率方面显著优于现有方法。它需要约13%的监督信息来达到与原始半监督方法相似的性能。