Ni Jingchao, Cheng Wei, Fan Wei, Zhang Xiang
College of Information Sciences and Technology, Pennsylvania State University, PA 16802 USA.
NEC Laboratories America, NJ 08540 USA.
IEEE Trans Knowl Data Eng. 2018 Mar 1;30(3):435-448. doi: 10.1109/TKDE.2017.2771762. Epub 2017 Nov 9.
Joint clustering of multiple networks has been shown to be more accurate than performing clustering on individual networks separately. This is because multi-network clustering algorithms typically assume there is a common clustering structure shared by all networks, and different networks can provide compatible and complementary information for uncovering this underlying clustering structure. However, this assumption is too strict to hold in many emerging applications, where multiple networks usually have diverse data distributions. More popularly, the networks in consideration belong to different underlying groups. Only networks in the same underlying group share similar clustering structures. Better clustering performance can be achieved by considering such groups differently. As a result, an ideal method should be able to automatically detect network groups so that networks in the same group share a common clustering structure. To address this problem, we propose a new method, ComClus, to simultaneously group and cluster multiple networks. ComClus is novel in combining the clustering approach of non-negative matrix factorization (NMF) and the feature subspace learning approach of metric learning. Specifically, it treats node clusters as features of networks and learns proper subspaces from such features to differentiate different network groups. During the learning process, the two procedures of network grouping and clustering are coupled and mutually enhanced. Moreover, ComClus can effectively leverage prior knowledge on how to group networks such that network grouping can be conducted in a semi-supervised manner. This will enable users to guide the grouping process using domain knowledge so that network clustering accuracy can be further boosted. Extensive experimental evaluations on a variety of synthetic and real datasets demonstrate the effectiveness and scalability of the proposed method.
已证明对多个网络进行联合聚类比分别对单个网络进行聚类更准确。这是因为多网络聚类算法通常假设所有网络共享一个共同的聚类结构,并且不同网络可以为揭示这种潜在的聚类结构提供兼容和互补的信息。然而,在许多新兴应用中,这个假设过于严格而无法成立,在这些应用中多个网络通常具有不同的数据分布。更常见的情况是,所考虑的网络属于不同的底层组。只有同一底层组中的网络共享相似的聚类结构。通过区别考虑这些组,可以实现更好的聚类性能。因此,一种理想的方法应该能够自动检测网络组,以便同一组中的网络共享一个共同的聚类结构。为了解决这个问题,我们提出了一种新方法ComClus,用于同时对多个网络进行分组和聚类。ComClus的新颖之处在于将非负矩阵分解(NMF)的聚类方法和度量学习的特征子空间学习方法相结合。具体来说,它将节点聚类视为网络的特征,并从这些特征中学习合适的子空间以区分不同的网络组。在学习过程中,网络分组和聚类这两个过程相互耦合并相互增强。此外,ComClus可以有效地利用关于如何对网络进行分组的先验知识,从而能够以半监督的方式进行网络分组。这将使用户能够使用领域知识来指导分组过程,从而进一步提高网络聚类的准确性。在各种合成和真实数据集上进行的广泛实验评估证明了所提方法的有效性和可扩展性。