Suppr超能文献

一种用于聚类集成的可扩展框架。

A Scalable Framework For Cluster Ensembles.

作者信息

Hore Prodip, Hall Lawrence O, Goldgof Dmitry B

机构信息

Department of Computer Science and Engineering, ENB118, University of South Florida, Tampa, Florida 33620.

出版信息

Pattern Recognit. 2009 May;42(5):676-688. doi: 10.1016/j.patcog.2008.09.027.

Abstract

An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented by sets of cluster centers, are introduced. The advantage of these approaches is that they provide a final partition of data that is comparable to the best existing approaches, yet scale to extremely large data sets. They can be 100,000 times faster while using much less memory. The new algorithms are compared against the best existing cluster ensemble merging approaches, clustering all the data at once and a clustering algorithm designed for very large data sets. The comparison is done for fuzzy and hard k-means based clustering algorithms. It is shown that the centroid-based ensemble merging algorithms presented here generate partitions of quality comparable to the best label vector approach or clustering all the data at once, while providing very large speedups.

摘要

出于多种原因,可能会生成一组聚类解决方案或划分。如果数据集非常大,可以对易于处理的大小不相交的子集进行聚类。数据可能分布在不同的站点,对于这种情况,具有最终划分合并的分布式聚类解决方案是很自然的选择。在本文中,介绍了两种以聚类中心集表示的组合划分的新方法。这些方法的优点是它们提供的数据最终划分与现有的最佳方法相当,但能扩展到极大的数据集。它们可以快10万倍,同时使用的内存要少得多。将新算法与现有的最佳聚类集成合并方法、一次性对所有数据进行聚类以及为非常大的数据集设计的聚类算法进行了比较。针对基于模糊和硬k均值的聚类算法进行了比较。结果表明,本文提出的基于质心的集成合并算法生成的划分质量与最佳标签向量方法或一次性对所有数据进行聚类相当,同时提供了非常大的加速比。

相似文献

1
A Scalable Framework For Cluster Ensembles.一种用于聚类集成的可扩展框架。
Pattern Recognit. 2009 May;42(5):676-688. doi: 10.1016/j.patcog.2008.09.027.
3
Combining multiple clusterings using evidence accumulation.使用证据积累合并多个聚类。
IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.
5
The k partition-distance problem.k划分距离问题。
J Comput Biol. 2012 Apr;19(4):404-17. doi: 10.1089/cmb.2010.0186.
7
Clustering ensembles: models of consensus and weak partitions.聚类集成:共识模型与弱划分
IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.

本文引用的文献

1
Complexity reduction for "large image" processing.用于“大图像”处理的复杂度降低
IEEE Trans Syst Man Cybern B Cybern. 2002;32(5):598-611. doi: 10.1109/TSMCB.2002.1033179.
2
On weighting clustering.关于加权聚类
IEEE Trans Pattern Anal Mach Intell. 2006 Aug;28(8):1223-35. doi: 10.1109/TPAMI.2006.168.
3
Clustering ensembles: models of consensus and weak partitions.聚类集成:共识模型与弱划分
IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.
5
Online clustering algorithms for radar emitter classification.用于雷达辐射源分类的在线聚类算法
IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1185-96. doi: 10.1109/TPAMI.2005.166.
6
Combining multiple clusterings using evidence accumulation.使用证据积累合并多个聚类。
IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.
7
Bagging to improve the accuracy of a clustering procedure.通过装袋法提高聚类过程的准确性。
Bioinformatics. 2003 Jun 12;19(9):1090-9. doi: 10.1093/bioinformatics/btg038.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验