基于数据片段的高效聚类聚合

Wu Ou, Hu Weiming, Maybank Stephen J, Zhu Mingliang, Li Bing

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):913-26. doi: 10.1109/TSMCB.2012.2183591. Epub 2012 Feb 10.

Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.

聚类聚合，也称为聚类集成，已成为一种强大的技术，用于组合不同的聚类结果以获得单个更好的聚类。现有的聚类聚合算法直接应用于数据点，即所谓的基于点的方法。如果数据点数量很大，这些算法效率低下。我们定义了一种基于数据片段的高效聚类聚合方法。在这种基于片段的方法中，数据片段是数据的任何子集，且该子集不会被任何聚类结果分割。为了建立所提出方法的理论基础，我们证明了在从文献中选取的两种广泛使用的聚类聚合优良度量下，可以直接对数据片段执行聚类聚合。描述了三种新的聚类聚合算法。使用几个公共数据集获得的实验结果表明，新算法的计算复杂度低于三种著名的现有基于点的聚类聚合算法（凝聚式、最远点式和局部搜索式）；然而，新算法并没有牺牲准确性。

相似文献

Efficient clustering aggregation based on data fragments.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):913-26. doi: 10.1109/TSMCB.2012.2183591. Epub 2012 Feb 10.

Scalable model-based clustering for large databases based on data summarization.

IEEE Trans Pattern Anal Mach Intell. 2005 Nov;27(11):1710-9. doi: 10.1109/TPAMI.2005.226.

Fast graph-based relaxed clustering for large data sets using minimal enclosing ball.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):672-87. doi: 10.1109/TSMCB.2011.2172604. Epub 2012 Feb 3.

Combining multiple clusterings using evidence accumulation.

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.

Tailored aggregation for classification.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2098-105. doi: 10.1109/TPAMI.2009.55.

CA-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles.

IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):686-98. doi: 10.1109/TSMCB.2010.2086059. Epub 2010 Nov 11.

General C-means clustering model.

IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1197-211. doi: 10.1109/TPAMI.2005.160.

FINE: fisher information nonparametric embedding.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2093-8. doi: 10.1109/TPAMI.2009.67.

IEEE Trans Pattern Anal Mach Intell. 2004 Apr;26(4):434-48. doi: 10.1109/TPAMI.2004.1265860.

On weighting clustering.

IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1223-35. doi: 10.1109/TPAMI.2006.168.

引用本文的文献

An Effective Collaborative Mobile Weighted Clustering Schemes for Energy Balancing in Wireless Sensor Networks.

Sensors (Basel). 2016 Feb 19;16(2):261. doi: 10.3390/s16020261.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Efficient clustering aggregation based on data fragments.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):913-26. doi: 10.1109/TSMCB.2012.2183591. Epub 2012 Feb 10.

Scalable model-based clustering for large databases based on data summarization.

IEEE Trans Pattern Anal Mach Intell. 2005 Nov;27(11):1710-9. doi: 10.1109/TPAMI.2005.226.

Fast graph-based relaxed clustering for large data sets using minimal enclosing ball.

IEEE Trans Syst Man Cybern B Cybern. 2012 Jun;42(3):672-87. doi: 10.1109/TSMCB.2011.2172604. Epub 2012 Feb 3.

Combining multiple clusterings using evidence accumulation.

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.

Tailored aggregation for classification.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2098-105. doi: 10.1109/TPAMI.2009.55.

CA-tree: a hierarchical structure for efficient and scalable coassociation-based cluster ensembles.

IEEE Trans Syst Man Cybern B Cybern. 2011 Jun;41(3):686-98. doi: 10.1109/TSMCB.2010.2086059. Epub 2010 Nov 11.

General C-means clustering model.

IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1197-211. doi: 10.1109/TPAMI.2005.160.

FINE: fisher information nonparametric embedding.

IEEE Trans Pattern Anal Mach Intell. 2009 Nov;31(11):2093-8. doi: 10.1109/TPAMI.2009.67.

IEEE Trans Pattern Anal Mach Intell. 2004 Apr;26(4):434-48. doi: 10.1109/TPAMI.2004.1265860.

On weighting clustering.

IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1223-35. doi: 10.1109/TPAMI.2006.168.

引用本文的文献

An Effective Collaborative Mobile Weighted Clustering Schemes for Energy Balancing in Wireless Sensor Networks.

Sensors (Basel). 2016 Feb 19;16(2):261. doi: 10.3390/s16020261.

Efficient clustering aggregation based on data fragments.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献