基于图的共识聚类用于从基因表达数据中发现类别

Graph-based consensus clustering for class discovery from gene expression data.

作者信息

Yu Zhiwen, Wong Hau-San, Wang Hongqiang

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

DOI:10.1093/bioinformatics/btm463

PMID:17872912

Abstract

MOTIVATION

Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data.

RESULTS

In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as K(max) in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning.

AVAILABILITY

Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu.

摘要

动机

一致性聚类，也称为聚类集成，是微阵列数据分析的重要技术之一，尤其适用于从微阵列数据中发现类别。与传统聚类算法相比，一致性聚类方法能够整合来自不同聚类解决方案的多个划分，以提高聚类算法的鲁棒性、稳定性、可扩展性和并行性。通过一致性聚类，可以发现基因表达数据中样本的潜在类别。

结果

除了探索一种基于图的一致性聚类（GCC）算法来估计微阵列数据中样本的潜在类别外，我们还设计了一种新的验证指标来确定微阵列数据中的类别数量。据我们所知，这是首次将GCC应用于微阵列数据的类别发现。给定一个预先指定的最大类别数（在本文中表示为K(max)），我们的算法可以根据一个名为修正兰德指数的新聚类验证指标，发现微阵列数据中样本的真实类别数。对基因表达数据的实验表明，我们的新算法能够（i）优于大多数现有算法，（ii）在真实癌症数据集中正确识别类别数量，以及（iii）发现具有生物学意义的样本类别。

可用性

可根据要求向于志文索取GCC算法的Matlab源代码。

相似文献

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

A mixture model with random-effects components for clustering correlated gene-expression profiles.

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

Class discovery from gene expression data based on perturbation and cluster ensemble.

IEEE Trans Nanobioscience. 2009 Jun;8(2):147-60. doi: 10.1109/TNB.2009.2023321. Epub 2009 Jun 2.

Clustering of change patterns using Fourier coefficients.

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

A multi-stage approach to clustering and imputation of gene expression profiles.

Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.

An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.

IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

An improved algorithm for clustering gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

引用本文的文献

Cross-talk of mA methylation modification and the tumor microenvironment composition in esophageal cancer.

Front Immunol. 2025 Jul 7;16:1572810. doi: 10.3389/fimmu.2025.1572810. eCollection 2025.

VIASCKDE Index: A Novel Internal Cluster Validity Index for Arbitrary-Shaped Clusters Based on the Kernel Density Estimation.

Comput Intell Neurosci. 2022 Jun 8;2022:4059302. doi: 10.1155/2022/4059302. eCollection 2022.

ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets.

J Bioinform Comput Biol. 2020 Jun;18(3):2040009. doi: 10.1142/S0219720020400090.

Overlapping clustering of gene expression data using penalized weighted normalized cut.

Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.

Assisted gene expression-based clustering with AWNCut.

Stat Med. 2018 Dec 20;37(29):4386-4403. doi: 10.1002/sim.7928. Epub 2018 Aug 9.

Cluster ensemble based on Random Forests for genetic data.

BioData Min. 2017 Dec 15;10:37. doi: 10.1186/s13040-017-0156-2. eCollection 2017.

Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes.

Sci Rep. 2017 Jul 7;7(1):4896. doi: 10.1038/s41598-017-05275-3.

Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes.

Nat Commun. 2017 May 5;8:15309. doi: 10.1038/ncomms15309.

Clustering cancer gene expression data by projective clustering ensemble.

PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017.

Interpolation based consensus clustering for gene expression time series.

BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于图的共识聚类用于从基因表达数据中发现类别

Graph-based consensus clustering for class discovery from gene expression data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献