Zhang Junpu, Li Liang, Zhang Pei, Liu Yue, Wang Siwei, Zhou Changbao, Liu Xinwang, Zhu En
IEEE Trans Neural Netw Learn Syst. 2025 May;36(5):9592-9605. doi: 10.1109/TNNLS.2024.3435058. Epub 2025 May 2.
Clustering is a popular research pipeline in unsupervised learning to find potential groupings. As a representative paradigm in multiple kernel clustering (MKC), late fusion-based models learn a consistent partition across multiple base kernels. Despite their promising performance, a common concern is the limited representation capacity caused by the inflexible fusion mechanism. Concretely, the representations are constrained by truncated-k Eigen-decomposition (EVD) without fully exploiting potential information. An intuitive idea to alleviate this concern is to generate a set of augmented partitions and then select the optimal partition by fine-tuning. However, this is overlimited by: 1) introducing undesired hyperparameters and dataset-related consequences; 2) neglecting rich information across diverse partitions; and 3) expensive parameter-tuning costs. To address these problems, we propose transforming the challenging problem of directly determining the optimal partition (optimal parameter) into a diverse partition fusion (parameter ensemble) problem. We design a novel flexible fusion mechanism called tuning-free multiple kernel clustering coupled with diverse partition fusion (TFMKC) by reweighting diverse partitions through optimization, achieving an optimal consensus partition by integrating diverse and complementary information rather than traditional fine-tuning, and distinguishing our work from existing methods. Extensive experiments verify that TFMKC achieves competitive effectiveness and efficiency over comparison baselines. The code can be accessed at https://github.com/ZJP/TFMKC.
聚类是无监督学习中用于寻找潜在分组的一种流行研究方法。作为多核聚类(MKC)中的一种代表性范式,基于后期融合的模型学习多个基础核上的一致划分。尽管它们具有良好的性能,但一个普遍关注的问题是由不灵活的融合机制导致的表示能力有限。具体而言,这些表示受到截断k特征值分解(EVD)的约束,而没有充分利用潜在信息。缓解这一问题的一个直观想法是生成一组增强划分,然后通过微调选择最优划分。然而,这受到以下限制:1)引入了不必要的超参数和与数据集相关的结果;2)忽略了不同划分中的丰富信息;3)参数调优成本高昂。为了解决这些问题,我们提出将直接确定最优划分(最优参数)这一具有挑战性的问题转化为一个多样划分融合(参数集成)问题。我们设计了一种新颖的灵活融合机制,称为无调优多核聚类与多样划分融合(TFMKC),通过优化对不同划分进行重新加权,通过整合多样且互补的信息而不是传统的微调来实现最优共识划分,并使我们的工作与现有方法区分开来。大量实验验证了TFMKC相对于比较基线具有有竞争力的有效性和效率。代码可在https://github.com/ZJP/TFMKC获取。