迈向高维数据的多维度集成聚类：从子空间到度量及其他

Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond.

作者信息

Huang Dong, Wang Chang-Dong, Lai Jian-Huang, Kwoh Chee-Keong

出版信息

IEEE Trans Cybern. 2022 Nov;52(11):12231-12244. doi: 10.1109/TCYB.2021.3049633. Epub 2022 Oct 17.

DOI:10.1109/TCYB.2021.3049633

Abstract

The rapid emergence of high-dimensional data in various areas has brought new challenges to current ensemble clustering research. To deal with the curse of dimensionality, recently considerable efforts in ensemble clustering have been made by means of different subspace-based techniques. However, besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimilarity metrics. It remains a surprisingly open problem in ensemble clustering how to create and aggregate a large population of diversified metrics, and furthermore, how to jointly investigate the multilevel diversity in the large populations of metrics, subspaces, and clusters in a unified framework. To tackle this problem, this article proposes a novel multidiversified ensemble clustering approach. In particular, we create a large number of diversified metrics by randomizing a scaled exponential similarity kernel, which are then coupled with random subspaces to form a large set of metric-subspace pairs. Based on the similarity matrices derived from these metric-subspace pairs, an ensemble of diversified base clusterings can be thereby constructed. Furthermore, an entropy-based criterion is utilized to explore the cluster wise diversity in ensembles, based on which three specific ensemble clustering algorithms are presented by incorporating three types of consensus functions. Extensive experiments are conducted on 30 high-dimensional datasets, including 18 cancer gene expression datasets and 12 image/speech datasets, which demonstrate the superiority of our algorithms over the state of the art. The source code is available at https://github.com/huangdonghere/MDEC.

摘要

高维数据在各个领域的迅速出现给当前的集成聚类研究带来了新的挑战。为了应对维度诅咒，最近在集成聚类中通过不同的基于子空间的技术付出了相当大的努力。然而，除了对子空间的重视之外，对相似性/不相似性度量中潜在的多样性关注相当有限。在集成聚类中，如何创建和聚合大量多样化的度量，以及如何在统一框架中联合研究大量度量、子空间和聚类中的多层次多样性，仍然是一个令人惊讶的开放性问题。为了解决这个问题，本文提出了一种新颖的多多样化集成聚类方法。具体来说，我们通过对缩放指数相似性核进行随机化来创建大量多样化的度量，然后将它们与随机子空间相结合，形成一大组度量 - 子空间对。基于从这些度量 - 子空间对导出的相似性矩阵，可以构建一个多样化的基础聚类集成。此外，利用基于熵的准则来探索集成中的聚类级多样性，在此基础上通过纳入三种类型的共识函数提出了三种具体的集成聚类算法。在30个高维数据集上进行了广泛的实验，包括18个癌症基因表达数据集和12个图像/语音数据集，实验结果表明我们的算法优于现有技术。源代码可在https://github.com/huangdonghere/MDEC获取。

相似文献

Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond.

IEEE Trans Cybern. 2022 Nov;52(11):12231-12244. doi: 10.1109/TCYB.2021.3049633. Epub 2022 Oct 17.

Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.

BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.

EnsCat: clustering of categorical data via ensembling.

BMC Bioinformatics. 2016 Sep 15;17(1):380. doi: 10.1186/s12859-016-1245-9.

Locally Weighted Ensemble Clustering.

IEEE Trans Cybern. 2018 May;48(5):1460-1473. doi: 10.1109/TCYB.2017.2702343. Epub 2017 May 23.

Evolutionary Multiobjective Clustering Algorithms With Ensemble for Patient Stratification.

IEEE Trans Cybern. 2022 Oct;52(10):11027-11040. doi: 10.1109/TCYB.2021.3069434. Epub 2022 Sep 19.

Hybrid fuzzy cluster ensemble framework for tumor clustering from biomolecular data.

IEEE/ACM Trans Comput Biol Bioinform. 2013 May-Jun;10(3):657-70. doi: 10.1109/TCBB.2013.59.

A Clustering Ensemble Method for Cell Type Detection by Multiobjective Particle Optimization.

IEEE/ACM Trans Comput Biol Bioinform. 2023 Jan-Feb;20(1):1-14. doi: 10.1109/TCBB.2021.3132400. Epub 2023 Feb 3.

Fuzzy ensemble clustering based on random projections for DNA microarray data analysis.

Artif Intell Med. 2009 Feb-Mar;45(2-3):173-83. doi: 10.1016/j.artmed.2008.07.014. Epub 2008 Sep 17.

Tensorized Incomplete Multi-view Kernel Subspace Clustering.

Neural Netw. 2024 Nov;179:106529. doi: 10.1016/j.neunet.2024.106529. Epub 2024 Jul 9.

Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning.

Bioinformatics. 2019 Aug 15;35(16):2809-2817. doi: 10.1093/bioinformatics/bty1056.

引用本文的文献

Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data.

PLoS Comput Biol. 2023 Apr 17;19(4):e1011044. doi: 10.1371/journal.pcbi.1011044. eCollection 2023 Apr.

An Ensemble and Multi-View Clustering Method Based on Kolmogorov Complexity.

Entropy (Basel). 2023 Feb 17;25(2):371. doi: 10.3390/e25020371.

Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data.

PLoS Comput Biol. 2022 Dec 5;18(12):e1010753. doi: 10.1371/journal.pcbi.1010753. eCollection 2022 Dec.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

迈向高维数据的多维度集成聚类：从子空间到度量及其他

Toward Multidiversified Ensemble Clustering of High-Dimensional Data: From Subspaces to Metrics and Beyond.

作者信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献