Li Wang, Zhu En, Wang Siwei, Guo Xifeng
School of Computer Science, National University of Defense Technology, Changsha 410000, China.
School of Cyberspace Science, Dongguan University of Technology, Dongguan 523808, China.
Entropy (Basel). 2023 Oct 10;25(10):1432. doi: 10.3390/e25101432.
Graph clustering is a fundamental and challenging task in unsupervised learning. It has achieved great progress due to contrastive learning. However, we find that there are two problems that need to be addressed: (1) The augmentations in most graph contrastive clustering methods are manual, which can result in semantic drift. (2) Contrastive learning is usually implemented on the feature level, ignoring the structure level, which can lead to sub-optimal performance. In this work, we propose a method termed Graph Clustering with High-Order Contrastive Learning (GCHCL) to solve these problems. First, we construct two views by Laplacian smoothing raw features with different normalizations and design a structure alignment loss to force these two views to be mapped into the same space. Second, we build a contrastive similarity matrix with two structure-based similarity matrices and force it to align with an identity matrix. In this way, our designed contrastive learning encompasses a larger neighborhood, enabling our model to learn clustering-friendly embeddings without the need for an extra clustering module. In addition, our model can be trained on a large dataset. Extensive experiments on five datasets validate the effectiveness of our model. For example, compared to the second-best baselines on four small and medium datasets, our model achieved an average improvement of 3% in accuracy. For the largest dataset, our model achieved an accuracy score of 81.92%, whereas the compared baselines encountered out-of-memory issues.
图聚类是无监督学习中的一项基础且具有挑战性的任务。由于对比学习,它已经取得了巨大进展。然而,我们发现有两个问题需要解决:(1)大多数图对比聚类方法中的增强是手动的,这可能导致语义漂移。(2)对比学习通常在特征层面实现,忽略了结构层面,这可能导致次优性能。在这项工作中,我们提出了一种名为高阶对比学习图聚类(GCHCL)的方法来解决这些问题。首先,我们通过对具有不同归一化的原始特征进行拉普拉斯平滑来构建两个视图,并设计一个结构对齐损失来迫使这两个视图映射到同一空间。其次,我们用两个基于结构的相似性矩阵构建一个对比相似性矩阵,并迫使它与单位矩阵对齐。通过这种方式,我们设计的对比学习涵盖了更大的邻域,使我们的模型能够学习到对聚类友好的嵌入,而无需额外的聚类模块。此外,我们的模型可以在大型数据集上进行训练。在五个数据集上进行的大量实验验证了我们模型的有效性。例如,与四个中小数据集上的次优基线相比,我们的模型在准确率上平均提高了3%。对于最大的数据集,我们的模型获得了81.92%的准确率得分,而相比之下的基线则遇到了内存不足的问题。