Mall Raghvendra, Langone Rocco, Suykens Johan A K
ESAT-STADIUS, KU Leuven, Leuven, Belgium.
PLoS One. 2014 Jun 20;9(6):e99966. doi: 10.1371/journal.pone.0099966. eCollection 2014.
Kernel spectral clustering corresponds to a weighted kernel principal component analysis problem in a constrained optimization framework. The primal formulation leads to an eigen-decomposition of a centered Laplacian matrix at the dual level. The dual formulation allows to build a model on a representative subgraph of the large scale network in the training phase and the model parameters are estimated in the validation stage. The KSC model has a powerful out-of-sample extension property which allows cluster affiliation for the unseen nodes of the big data network. In this paper we exploit the structure of the projections in the eigenspace during the validation stage to automatically determine a set of increasing distance thresholds. We use these distance thresholds in the test phase to obtain multiple levels of hierarchy for the large scale network. The hierarchical structure in the network is determined in a bottom-up fashion. We empirically showcase that real-world networks have multilevel hierarchical organization which cannot be detected efficiently by several state-of-the-art large scale hierarchical community detection techniques like the Louvain, OSLOM and Infomap methods. We show that a major advantage of our proposed approach is the ability to locate good quality clusters at both the finer and coarser levels of hierarchy using internal cluster quality metrics on 7 real-life networks.
核谱聚类对应于约束优化框架中的加权核主成分分析问题。原始公式在对偶层面导致中心拉普拉斯矩阵的特征分解。对偶公式允许在训练阶段基于大规模网络的代表性子图构建模型,并在验证阶段估计模型参数。KSC模型具有强大的样本外扩展属性,可对大数据网络中未见过的节点进行聚类归属。在本文中,我们在验证阶段利用特征空间中投影的结构来自动确定一组递增的距离阈值。我们在测试阶段使用这些距离阈值为大规模网络获得多个层次级别。网络中的层次结构以自底向上的方式确定。我们通过实验表明,现实世界的网络具有多级层次组织,而诸如Louvain、OSLOM和Infomap方法等几种先进的大规模层次社区检测技术无法有效地检测到这种组织。我们表明,我们提出的方法的一个主要优点是能够使用7个真实网络上的内部聚类质量指标在层次结构的精细和粗糙级别上定位高质量的聚类。