• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于图的共识聚类用于从基因表达数据中发现类别

Graph-based consensus clustering for class discovery from gene expression data.

作者信息

Yu Zhiwen, Wong Hau-San, Wang Hongqiang

机构信息

Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong.

出版信息

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

DOI:10.1093/bioinformatics/btm463
PMID:17872912
Abstract

MOTIVATION

Consensus clustering, also known as cluster ensemble, is one of the important techniques for microarray data analysis, and is particularly useful for class discovery from microarray data. Compared with traditional clustering algorithms, consensus clustering approaches have the ability to integrate multiple partitions from different cluster solutions to improve the robustness, stability, scalability and parallelization of the clustering algorithms. By consensus clustering, one can discover the underlying classes of the samples in gene expression data.

RESULTS

In addition to exploring a graph-based consensus clustering (GCC) algorithm to estimate the underlying classes of the samples in microarray data, we also design a new validation index to determine the number of classes in microarray data. To our knowledge, this is the first time in which GCC is applied to class discovery for microarray data. Given a pre specified maximum number of classes (denoted as K(max) in this article), our algorithm can discover the true number of classes for the samples in microarray data according to a new cluster validation index called the Modified Rand Index. Experiments on gene expression data indicate that our new algorithm can (i) outperform most of the existing algorithms, (ii) identify the number of classes correctly in real cancer datasets, and (iii) discover the classes of samples with biological meaning.

AVAILABILITY

Matlab source code for the GCC algorithm is available upon request from Zhiwen Yu.

摘要

动机

一致性聚类,也称为聚类集成,是微阵列数据分析的重要技术之一,尤其适用于从微阵列数据中发现类别。与传统聚类算法相比,一致性聚类方法能够整合来自不同聚类解决方案的多个划分,以提高聚类算法的鲁棒性、稳定性、可扩展性和并行性。通过一致性聚类,可以发现基因表达数据中样本的潜在类别。

结果

除了探索一种基于图的一致性聚类(GCC)算法来估计微阵列数据中样本的潜在类别外,我们还设计了一种新的验证指标来确定微阵列数据中的类别数量。据我们所知,这是首次将GCC应用于微阵列数据的类别发现。给定一个预先指定的最大类别数(在本文中表示为K(max)),我们的算法可以根据一个名为修正兰德指数的新聚类验证指标,发现微阵列数据中样本的真实类别数。对基因表达数据的实验表明,我们的新算法能够(i)优于大多数现有算法,(ii)在真实癌症数据集中正确识别类别数量,以及(iii)发现具有生物学意义的样本类别。

可用性

可根据要求向于志文索取GCC算法的Matlab源代码。

相似文献

1
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
2
A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型,用于对相关基因表达谱进行聚类。
Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.
3
Class discovery from gene expression data based on perturbation and cluster ensemble.基于扰动和聚类集成从基因表达数据中发现类别
IEEE Trans Nanobioscience. 2009 Jun;8(2):147-60. doi: 10.1109/TNB.2009.2023321. Epub 2009 Jun 2.
4
Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。
Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.
5
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
6
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
7
A multi-stage approach to clustering and imputation of gene expression profiles.一种用于基因表达谱聚类和插补的多阶段方法。
Bioinformatics. 2007 Apr 15;23(8):998-1005. doi: 10.1093/bioinformatics/btm053. Epub 2007 Feb 18.
8
An iterative data mining approach for mining overlapping coexpression patterns in noisy gene expression data.一种用于在嘈杂基因表达数据中挖掘重叠共表达模式的迭代数据挖掘方法。
IEEE Trans Nanobioscience. 2009 Sep;8(3):252-8. doi: 10.1109/TNB.2009.2026747. Epub 2009 Jul 14.
9
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.用于基因分组的分裂相关聚类算法(DCCA):检测表达谱中的变化模式。
Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.
10
An improved algorithm for clustering gene expression data.一种用于聚类基因表达数据的改进算法。
Bioinformatics. 2007 Nov 1;23(21):2859-65. doi: 10.1093/bioinformatics/btm418. Epub 2007 Aug 25.

引用本文的文献

1
Cross-talk of mA methylation modification and the tumor microenvironment composition in esophageal cancer.食管癌中mA甲基化修饰与肿瘤微环境组成的相互作用
Front Immunol. 2025 Jul 7;16:1572810. doi: 10.3389/fimmu.2025.1572810. eCollection 2025.
2
VIASCKDE Index: A Novel Internal Cluster Validity Index for Arbitrary-Shaped Clusters Based on the Kernel Density Estimation.VIASCKDE指标:一种基于核密度估计的用于任意形状聚类的新型内部聚类有效性指标。
Comput Intell Neurosci. 2022 Jun 8;2022:4059302. doi: 10.1155/2022/4059302. eCollection 2022.
3
ClusterMine: A knowledge-integrated clustering approach based on expression profiles of gene sets.
ClusterMine:一种基于基因集表达谱的知识整合聚类方法。
J Bioinform Comput Biol. 2020 Jun;18(3):2040009. doi: 10.1142/S0219720020400090.
4
Overlapping clustering of gene expression data using penalized weighted normalized cut.使用惩罚加权归一化割算法对基因表达数据进行重叠聚类
Genet Epidemiol. 2018 Dec;42(8):796-811. doi: 10.1002/gepi.22164. Epub 2018 Oct 9.
5
Assisted gene expression-based clustering with AWNCut.基于辅助基因表达的聚类分析与 AWNCut。
Stat Med. 2018 Dec 20;37(29):4386-4403. doi: 10.1002/sim.7928. Epub 2018 Aug 9.
6
Cluster ensemble based on Random Forests for genetic data.基于随机森林的基因数据聚类集成方法
BioData Min. 2017 Dec 15;10:37. doi: 10.1186/s13040-017-0156-2. eCollection 2017.
7
Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes.基于 Nyström 逼近的谱聚类用于准确识别癌症分子亚型。
Sci Rep. 2017 Jul 7;7(1):4896. doi: 10.1038/s41598-017-05275-3.
8
Tradict enables accurate prediction of eukaryotic transcriptional states from 100 marker genes.Tradict 可通过 100 个标记基因准确预测真核转录状态。
Nat Commun. 2017 May 5;8:15309. doi: 10.1038/ncomms15309.
9
Clustering cancer gene expression data by projective clustering ensemble.通过投影聚类集成对癌症基因表达数据进行聚类
PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017.
10
Interpolation based consensus clustering for gene expression time series.基于插值的基因表达时间序列一致性聚类
BMC Bioinformatics. 2015 Apr 16;16:117. doi: 10.1186/s12859-015-0541-0.