使用证据积累合并多个聚类。

Combining multiple clusterings using evidence accumulation.

作者信息

Fred Ana L N, Jain Anil K

机构信息

Instituto Superior Técnico, Instituto de Telecomunicações, Av. Rovisco Pais, 1049-001 Lisboa, Portugal.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.

DOI:10.1109/TPAMI.2005.113

PMID:15943417

Abstract

We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.

摘要

我们探讨了用于合并多个聚类结果的证据积累（EAC）概念。首先，生成一个聚类集成——一组对象划分。给定一个数据集（d维中的n个对象或模式），生成数据划分的不同方法有：1）应用不同的聚类算法，以及2）使用不同的参数值或初始化来应用相同的聚类算法。此外，不同数据表示（特征空间）和聚类算法的组合也可以提供大量显著不同的数据划分。我们提出了一个简单的框架，用于在聚类集成中给定各种划分的情况下提取一致的聚类。根据EAC概念，每个划分都被视为数据组织的独立证据，基于投票机制将各个数据划分组合起来，以生成n个模式之间的新的n×n相似性矩阵。通过对该矩阵应用层次凝聚聚类算法来获得n个模式的最终数据划分。我们基于数据划分之间的互信息概念，开发了一个理论框架来分析所提出的聚类组合策略及其评估。使用自助法技术评估结果的稳定性。详细讨论了一种基于证据积累的聚类算法，该算法使用基于K均值聚类算法的分裂和合并策略。将所提出方法在几个合成数据集和真实数据集上的实验结果与其他组合策略以及著名聚类算法产生的单个聚类结果进行了比较。

相似文献

Combining multiple clusterings using evidence accumulation.

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.

Clustering ensembles: models of consensus and weak partitions.

IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.

Evaluation of stability of k-means cluster ensembles with respect to random initialization.

IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1798-808. doi: 10.1109/TPAMI.2006.226.

Cumulative voting consensus method for partitions with variable number of clusters.

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):160-73. doi: 10.1109/TPAMI.2007.1138.

Automated variable weighting in k-means type clustering.

IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):657-68. doi: 10.1109/TPAMI.2005.95.

Object-based image analysis using multiscale connectivity.

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):892-907. doi: 10.1109/TPAMI.2005.124.

A novel kernel method for clustering.

IEEE Trans Pattern Anal Mach Intell. 2005 May;27(5):801-5. doi: 10.1109/TPAMI.2005.88.

Simultaneous feature selection and clustering using mixture models.

IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1154-66. doi: 10.1109/TPAMI.2004.71.

LEGClust- a clustering algorithm based on layered entropic subgraphs.

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):62-75. doi: 10.1109/TPAMI.2007.1142.

A genetic algorithm using hyper-quadtrees for low-dimensional K-means clustering.

IEEE Trans Pattern Anal Mach Intell. 2006 Apr;28(4):533-43. doi: 10.1109/TPAMI.2006.66.

引用本文的文献

scEVE: a single-cell RNA-seq ensemble clustering algorithm capitalizing on the differences of predictions between multiple clustering methods.

NAR Genom Bioinform. 2025 Jun 9;7(2):lqaf073. doi: 10.1093/nargab/lqaf073. eCollection 2025 Jun.

Characterize neuronal responses to natural movies in the mouse superior colliculus.

Front Cell Neurosci. 2025 Mar 11;19:1558504. doi: 10.3389/fncel.2025.1558504. eCollection 2025.

Statistical Significance of Clustering with Multidimensional Scaling.

J Comput Graph Stat. 2024;33(1):219-230. doi: 10.1080/10618600.2023.2219708. Epub 2023 Jul 20.

ECG arrhythmia classification based on the fast ant colony clustering algorithm with improved spatiotemporal feature perception ability.

Heliyon. 2024 Aug 28;10(17):e37111. doi: 10.1016/j.heliyon.2024.e37111. eCollection 2024 Sep 15.

Distinct brain morphometry patterns revealed by deep learning improve prediction of post-stroke aphasia severity.

Commun Med (Lond). 2024 Jun 12;4(1):115. doi: 10.1038/s43856-024-00541-8.

Machine learning with taxonomic family delimitation aids in the classification of ephemeral beaked whale events in passive acoustic monitoring.

PLoS One. 2024 Jun 4;19(6):e0304744. doi: 10.1371/journal.pone.0304744. eCollection 2024.

Unsupervised analysis of whole transcriptome data from human pluripotent stem cells cardiac differentiation.

Sci Rep. 2024 Feb 7;14(1):3110. doi: 10.1038/s41598-024-52970-z.

Dual-level clustering ensemble algorithm with three consensus strategies.

Sci Rep. 2023 Dec 18;13(1):22617. doi: 10.1038/s41598-023-49947-9.

Projecting genetic associations through gene expression patterns highlights disease etiology and drug mechanisms.

Nat Commun. 2023 Sep 9;14(1):5562. doi: 10.1038/s41467-023-41057-4.

Distinct brain morphometry patterns revealed by deep learning improve prediction of aphasia severity.

Res Sq. 2023 Jul 3:rs.3.rs-3126126. doi: 10.21203/rs.3.rs-3126126/v1.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用证据积累合并多个聚类。

Combining multiple clusterings using evidence accumulation.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献