Suppr超能文献

具有可变聚类数的分区的累积投票共识方法。

Cumulative voting consensus method for partitions with variable number of clusters.

作者信息

Ayad Hanan G, Kamel Mohamed S

机构信息

Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1.

出版信息

IEEE Trans Pattern Anal Mach Intell. 2008 Jan;30(1):160-73. doi: 10.1109/TPAMI.2007.1138.

Abstract

Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where, unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.

摘要

在过去几年中,人们对共识聚类问题重新产生了兴趣。已经提出了几种新方法来为一组n个数据对象找到一个共识划分,该划分能最优地总结一个集成。在本文中,我们提出了具有线性计算复杂度(关于n)的新共识聚类算法。我们考虑由随机数量的聚类生成的聚类,这些聚类由分类随机变量来描述。我们引入累积投票的概念作为聚类标签对齐问题的解决方案,与常见的一对一投票方案不同,这里计算的是一个概率映射。我们寻求集成的第一个总结,使映射后的划分与集成的最优表示之间的平均平方距离最小,其中参考聚类的选择标准是基于最大化由熵衡量的信息内容来定义的。我们描述了累积投票加权方案和相应的算法,以计算总结集成的经验概率分布。给定输入划分的任意数量的聚类,我们将提取最优共识的问题表述为找到估计分布的压缩总结,该总结保留最大相关信息。通过使用一种凝聚算法获得了一个有效的解决方案,该算法使聚类内的平均广义 Jensen-Shannon 散度最小。实证研究表明,与最近的几种共识聚类算法相比,在准确性和性能方面有显著提高。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验