Zamora Juan, Sublime Jérémie
Instituto de Estadística, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2830, Valparaíso 2340025, Chile.
ISEP-School of Digital Engineers, 92130 Issy-Les-Moulineaux, France.
Entropy (Basel). 2023 Feb 17;25(2):371. doi: 10.3390/e25020371.
The ability to build more robust clustering from many clustering models with different solutions is relevant in scenarios with privacy-preserving constraints, where data features have a different nature or where these features are not available in a single computation unit. Additionally, with the booming number of multi-view data, but also of clustering algorithms capable of producing a wide variety of representations for the same objects, merging clustering partitions to achieve a single clustering result has become a complex problem with numerous applications. To tackle this problem, we propose a clustering fusion algorithm that takes existing clustering partitions acquired from multiple vector space models, sources, or views, and merges them into a single partition. Our merging method relies on an information theory model based on Kolmogorov complexity that was originally proposed for unsupervised multi-view learning. Our proposed algorithm features a stable merging process and shows competitive results over several real and artificial datasets in comparison with other state-of-the-art methods that have similar goals.
在存在隐私保护约束的场景中,从具有不同解决方案的多个聚类模型构建更强大聚类的能力是相关的,在这些场景中,数据特征具有不同的性质,或者这些特征在单个计算单元中不可用。此外,随着多视图数据数量的激增,以及能够为同一对象生成多种表示的聚类算法数量的增加,合并聚类分区以获得单个聚类结果已成为一个具有众多应用的复杂问题。为了解决这个问题,我们提出了一种聚类融合算法,该算法获取从多个向量空间模型、源或视图获得的现有聚类分区,并将它们合并为一个分区。我们的合并方法依赖于基于柯尔莫哥洛夫复杂度的信息论模型,该模型最初是为无监督多视图学习而提出的。我们提出的算法具有稳定的合并过程,并且与其他具有类似目标的最新方法相比,在几个真实和人工数据集上显示出有竞争力的结果。