Wang Jingyu, Ma Zhenyu, Nie Feiping, Li Xuelong
IEEE Trans Neural Netw Learn Syst. 2022 Sep;33(9):4199-4212. doi: 10.1109/TNNLS.2021.3056080. Epub 2022 Aug 31.
Benefit from avoiding the utilization of labeled samples, which are usually insufficient in the real world, unsupervised learning has been regarded as a speedy and powerful strategy on clustering tasks. However, clustering directly from primal data sets leads to high computational cost, which limits its application on large-scale and high-dimensional problems. Recently, anchor-based theories are proposed to partly mitigate this problem and field naturally sparse affinity matrix, while it is still a challenge to get excellent performance along with high efficiency. To dispose of this issue, we first presented a fast semisupervised framework (FSSF) combined with a balanced K -means-based hierarchical K -means (BKHK) method and the bipartite graph theory. Thereafter, we proposed a fast self-supervised clustering method involved in this crucial semisupervised framework, in which all labels are inferred from a constructed bipartite graph with exactly k connected components. The proposed method remarkably accelerates the general semisupervised learning through the anchor and consists of four significant parts: 1) obtaining the anchor set as interim through BKHK algorithm; 2) constructing the bipartite graph; 3) solving the self-supervised problem to construct a typical probability model with FSSF; and 4) selecting the most representative points regarding anchors from BKHK as an interim and conducting label propagation. The experimental results on toy examples and benchmark data sets have demonstrated that the proposed method outperforms other approaches.
得益于避免使用标记样本(在现实世界中标记样本通常是不足的),无监督学习被视为聚类任务中一种快速且强大的策略。然而,直接从原始数据集进行聚类会导致计算成本高昂,这限制了其在大规模和高维问题上的应用。最近,基于锚点的理论被提出以部分缓解此问题并生成自然稀疏的亲和矩阵,然而要在高效的同时获得优异性能仍然是一个挑战。为了解决这个问题,我们首先提出了一个快速半监督框架(FSSF),它结合了基于平衡K均值的层次K均值(BKHK)方法和二分图理论。此后,我们在这个关键的半监督框架中提出了一种快速自监督聚类方法,其中所有标签都从具有恰好k个连通分量的构造二分图中推断出来。所提出的方法通过锚点显著加速了一般的半监督学习,并且由四个重要部分组成:1)通过BKHK算法获得锚点集作为中间结果;2)构建二分图;3)通过FSSF解决自监督问题以构建典型概率模型;4)从BKHK中选择关于锚点的最具代表性的点作为中间结果并进行标签传播。在玩具示例和基准数据集上的实验结果表明,所提出的方法优于其他方法。