IEEE Trans Cybern. 2018 May;48(5):1460-1473. doi: 10.1109/TCYB.2017.2702343. Epub 2017 May 23.
Due to its ability to combine multiple base clusterings into a probably better and more robust clustering, the ensemble clustering technique has been attracting increasing attention in recent years. Despite the significant success, one limitation to most of the existing ensemble clustering methods is that they generally treat all base clusterings equally regardless of their reliability, which makes them vulnerable to low-quality base clusterings. Although some efforts have been made to (globally) evaluate and weight the base clusterings, yet these methods tend to view each base clustering as an individual and neglect the local diversity of clusters inside the same base clustering. It remains an open problem how to evaluate the reliability of clusters and exploit the local diversity in the ensemble to enhance the consensus performance, especially, in the case when there is no access to data features or specific assumptions on data distribution. To address this, in this paper, we propose a novel ensemble clustering approach based on ensemble-driven cluster uncertainty estimation and local weighting strategy. In particular, the uncertainty of each cluster is estimated by considering the cluster labels in the entire ensemble via an entropic criterion. A novel ensemble-driven cluster validity measure is introduced, and a locally weighted co-association matrix is presented to serve as a summary for the ensemble of diverse clusters. With the local diversity in ensembles exploited, two novel consensus functions are further proposed. Extensive experiments on a variety of real-world datasets demonstrate the superiority of the proposed approach over the state-of-the-art.
由于其能够将多个基础聚类组合成一个可能更好、更稳健的聚类,集成聚类技术近年来受到了越来越多的关注。尽管取得了重大成功,但大多数现有集成聚类方法的一个局限性是,它们通常平等对待所有基础聚类,而不考虑其可靠性,这使得它们容易受到低质量基础聚类的影响。尽管已经做出了一些努力来(全局)评估和加权基础聚类,但这些方法往往将每个基础聚类视为一个单独的个体,而忽略了同一基础聚类内部的集群局部多样性。在没有访问数据特征或对数据分布的特定假设的情况下,如何评估集群的可靠性并利用集成中的局部多样性来提高共识性能,仍然是一个悬而未决的问题。为了解决这个问题,在本文中,我们提出了一种基于集成驱动的聚类不确定性估计和局部加权策略的新的集成聚类方法。具体来说,通过使用熵准则考虑整个集成中的聚类标签来估计每个聚类的不确定性。引入了一种新的基于集成的聚类有效性度量,并提出了一种局部加权共同关联矩阵作为多样化聚类的集成摘要。利用集成中的局部多样性,进一步提出了两种新的共识函数。在各种真实数据集上的广泛实验表明,该方法优于最新技术。
IEEE Trans Cybern. 2017-5-23
Entropy (Basel). 2022-9-21
Artif Intell Med. 2009
IEEE Trans Pattern Anal Mach Intell. 2011-5-12
BioData Min. 2017-12-15
IEEE Trans Neural Netw Learn Syst. 2024-8
Bioinformatics. 2010-5-5
Comput Intell Neurosci. 2017-11-16
Entropy (Basel). 2025-3-26
Sci Rep. 2023-12-18
Entropy (Basel). 2022-9-21
PLoS Comput Biol. 2023-4
Sensors (Basel). 2022-10-14