聚类集成：共识模型与弱划分

Clustering ensembles: models of consensus and weak partitions.

作者信息

Topchy Alexander, Jain Anil K, Punch William

机构信息

Nielsen Media Research, 501 Brooker Creek Blvd., Oldsmar, FL 34677, USA.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.

DOI:10.1109/TPAMI.2005.237

PMID:16355656

Abstract

Clustering ensembles have emerged as a powerful method for improving both the robustness as well as the stability of unsupervised classification solutions. However, finding a consensus clustering from multiple partitions is a difficult problem that can be approached from graph-based, combinatorial, or statistical perspectives. This study extends previous research on clustering ensembles in several respects. First, we introduce a unified representation for multiple clusterings and formulate the corresponding categorical clustering problem. Second, we propose a probabilistic model of consensus using a finite mixture of multinomial distributions in a space of clusterings. A combined partition is found as a solution to the corresponding maximum-likelihood problem using the EM algorithm. Third, we define a new consensus function that is related to the classical intraclass variance criterion using the generalized mutual information definition. Finally, we demonstrate the efficacy of combining partitions generated by weak clustering algorithms that use data projections and random data splits. A simple explanatory model is offered for the behavior of combinations of such weak clustering components. Combination accuracy is analyzed as a function of several parameters that control the power and resolution of component partitions as well as the number of partitions. We also analyze clustering ensembles with incomplete information and the effect of missing cluster labels on the quality of overall consensus. Experimental results demonstrate the effectiveness of the proposed methods on several real-world data sets.

摘要

聚类集成已成为一种强大的方法，可提高无监督分类解决方案的稳健性和稳定性。然而，从多个划分中找到一个共识聚类是一个难题，可以从基于图、组合或统计的角度来解决。本研究在几个方面扩展了先前关于聚类集成的研究。首先，我们为多个聚类引入了统一表示，并制定了相应的分类聚类问题。其次，我们在聚类空间中使用多项分布的有限混合提出了一种共识概率模型。使用期望最大化（EM）算法找到组合划分作为相应最大似然问题的解决方案。第三，我们使用广义互信息定义定义了一个与经典类内方差准则相关的新共识函数。最后，我们展示了结合使用数据投影和随机数据分割的弱聚类算法生成的划分的有效性。为这种弱聚类组件的组合行为提供了一个简单的解释模型。组合准确率作为控制组件划分的能力和分辨率以及划分数量的几个参数的函数进行分析。我们还分析了具有不完整信息的聚类集成以及缺失聚类标签对整体共识质量的影响。实验结果证明了所提出方法在几个真实世界数据集上的有效性。

相似文献

Clustering ensembles: models of consensus and weak partitions.聚类集成：共识模型与弱划分

IEEE Trans Pattern Anal Mach Intell. 2005 Dec;27(12):1866-81. doi: 10.1109/TPAMI.2005.237.

Combining multiple clusterings using evidence accumulation.使用证据积累合并多个聚类。

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):835-50. doi: 10.1109/TPAMI.2005.113.

Cumulative voting consensus method for partitions with variable number of clusters.具有可变聚类数的分区的累积投票共识方法。

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):160-73. doi: 10.1109/TPAMI.2007.1138.

Evaluation of stability of k-means cluster ensembles with respect to random initialization.关于随机初始化的k均值聚类集成稳定性评估。

IEEE Trans Pattern Anal Mach Intell. 2006 Nov;28(11):1798-808. doi: 10.1109/TPAMI.2006.226.

Simultaneous feature selection and clustering using mixture models.使用混合模型进行同步特征选择和聚类

IEEE Trans Pattern Anal Mach Intell. 2004 Sep;26(9):1154-66. doi: 10.1109/TPAMI.2004.71.

Object-based image analysis using multiscale connectivity.使用多尺度连通性的基于对象的图像分析

IEEE Trans Pattern Anal Mach Intell. 2005 Jun;27(6):892-907. doi: 10.1109/TPAMI.2005.124.

A redundancy-based measure of dissimilarity among probability distributions for hierarchical clustering criteria.一种基于冗余的概率分布间差异度量，用于层次聚类准则。

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):76-88. doi: 10.1109/TPAMI.2007.1160.

Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities.将斯文森-王算法推广到对任意后验概率进行采样。

IEEE Trans Pattern Anal Mach Intell. 2005 Aug;27(8):1239-53. doi: 10.1109/TPAMI.2005.161.

LEGClust- a clustering algorithm based on layered entropic subgraphs.LEGClust——一种基于分层熵子图的聚类算法。

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):62-75. doi: 10.1109/TPAMI.2007.1142.

Localization of shapes using statistical models and stochastic optimization.使用统计模型和随机优化进行形状定位。

IEEE Trans Pattern Anal Mach Intell. 2007 Sep;29(9):1603-15. doi: 10.1109/TPAMI.2007.1157.

引用本文的文献

A Practical Guide to Identifying Robust Clusters in Neuroimaging Data.神经影像数据中稳健聚类识别实用指南。

Hum Brain Mapp. 2025 Sep;46(13):e70330. doi: 10.1002/hbm.70330.

An Ensemble and Multi-View Clustering Method Based on Kolmogorov Complexity.一种基于柯尔莫哥洛夫复杂性的集成与多视图聚类方法。

Entropy (Basel). 2023 Feb 17;25(2):371. doi: 10.3390/e25020371.

Stability estimation for unsupervised clustering: A review.无监督聚类的稳定性估计：综述

Wiley Interdiscip Rev Comput Stat. 2022 Nov-Dec;14(6):e1575. doi: 10.1002/wics.1575. Epub 2022 Jan 9.

Fast and interpretable consensus clustering via minipatch learning.通过微块学习实现快速且可解释的共识聚类。

PLoS Comput Biol. 2022 Oct 3;18(10):e1010577. doi: 10.1371/journal.pcbi.1010577. eCollection 2022 Oct.

Operational Modes Detection in Industrial Gas Turbines Using an Ensemble of Clustering Methods.基于聚类方法集成的工业燃气轮机运行模式检测

Sensors (Basel). 2021 Dec 1;21(23):8047. doi: 10.3390/s21238047.

A review on food recognition technology for health applications.关于用于健康应用的食物识别技术的综述。

Health Psychol Res. 2020 Dec 30;8(3):9297. doi: 10.4081/hpr.2020.9297.

KL Divergence-Based Fuzzy Cluster Ensemble for Image Segmentation.基于KL散度的模糊聚类集成用于图像分割

Entropy (Basel). 2018 Apr 12;20(4):273. doi: 10.3390/e20040273.

GrpClassifierEC: a novel classification approach based on the ensemble clustering space.分组分类器EC：一种基于集成聚类空间的新型分类方法。

Algorithms Mol Biol. 2020 Feb 13;15:3. doi: 10.1186/s13015-020-0162-7. eCollection 2020.

Positive and Negative Evidence Accumulation Clustering for Sensor Fusion: An Application to Heartbeat Clustering.正负面证据积累聚类在传感器融合中的应用：以心跳聚类为例。

Sensors (Basel). 2019 Oct 24;19(21):4635. doi: 10.3390/s19214635.

Towards Real-Time Prediction of Freezing of Gait in Patients With Parkinson's Disease: Addressing the Class Imbalance Problem.实时预测帕金森病患者的冻结步态：解决类别不平衡问题。

Sensors (Basel). 2019 Sep 10;19(18):3898. doi: 10.3390/s19183898.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

聚类集成：共识模型与弱划分

Clustering ensembles: models of consensus and weak partitions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献