Wu Ou, Hu Weiming, Maybank Stephen J, Zhu Mingliang, Li Bing
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):913-26. doi: 10.1109/TSMCB.2012.2183591. Epub 2012 Feb 10.
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.
聚类聚合,也称为聚类集成,已成为一种强大的技术,用于组合不同的聚类结果以获得单个更好的聚类。现有的聚类聚合算法直接应用于数据点,即所谓的基于点的方法。如果数据点数量很大,这些算法效率低下。我们定义了一种基于数据片段的高效聚类聚合方法。在这种基于片段的方法中,数据片段是数据的任何子集,且该子集不会被任何聚类结果分割。为了建立所提出方法的理论基础,我们证明了在从文献中选取的两种广泛使用的聚类聚合优良度量下,可以直接对数据片段执行聚类聚合。描述了三种新的聚类聚合算法。使用几个公共数据集获得的实验结果表明,新算法的计算复杂度低于三种著名的现有基于点的聚类聚合算法(凝聚式、最远点式和局部搜索式);然而,新算法并没有牺牲准确性。