使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

作者信息

Grotkjaer Thomas, Winther Ole, Regenberg Birgitte, Nielsen Jens, Hansen Lars Kai

机构信息

Center for Microbial Biotechnology BioCentrum-DTU, Building 223, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark.

出版信息

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

DOI:10.1093/bioinformatics/bti746

PMID:16257984

Abstract

MOTIVATION

Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualization and interpretation of the results.

RESULTS

We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data.

AVAILABILITY

Matlab source code for the clustering algorithm ClusterLustre, and the simulated dataset for testing are available upon request from T.G. and O.W.

摘要

动机

层次聚类和重定位聚类（如K均值聚类和自组织映射）已成为显示和分析全基因组DNA微阵列表达数据的成功工具。然而，层次聚类的结果对异常值敏感，并且大多数重定位方法给出的结果依赖于算法的初始化。因此，难以评估结果的显著性。我们开发了一种共识聚类算法，其中最终结果是在多次聚类运行的基础上进行平均，从而得到一个稳健且可重复的聚类，能够捕捉到小的信号变化。该算法保留了层次聚类的宝贵特性，这对于结果的可视化和解释很有用。

结果

我们首次表明，通过在共现矩阵中收集反复出现的聚类模式，可以在DNA微阵列分析中利用多次聚类运行。结果表明，使用高斯变分贝叶斯混合模型或K均值聚类多次聚类得到的共识聚类显著降低了模拟数据集的分类错误率。该方法具有灵活性，可以从不同的聚类算法中找到共识聚类。因此，该算法可以用作一个框架，以定量方式测试不同聚类算法的同质性。我们将该方法与许多先进的聚类方法进行了比较。结果表明，该方法稳健，对于一个真实的模拟数据集给出了较低的分类错误率。该算法也在真实数据集上进行了演示。结果表明，无需对数据进行保守的统计或倍数变化排除，就可以找到更多具有生物学意义的转录模式。

可用性

可根据T.G.和O.W.的要求获取聚类算法ClusterLustre的Matlab源代码以及用于测试的模拟数据集。

相似文献

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Clustering microarray gene expression data using weighted Chinese restaurant process.使用加权中国餐馆过程对微阵列基因表达数据进行聚类

Bioinformatics. 2006 Aug 15;22(16):1988-97. doi: 10.1093/bioinformatics/btl284. Epub 2006 Jun 9.

Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Detecting clusters of different geometrical shapes in microarray gene expression data.在微阵列基因表达数据中检测不同几何形状的聚类。

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

Clustering of change patterns using Fourier coefficients.使用傅里叶系数对变化模式进行聚类。

Bioinformatics. 2008 Jan 15;24(2):184-91. doi: 10.1093/bioinformatics/btm568. Epub 2007 Nov 19.

Towards clustering of incomplete microarray data without the use of imputation.迈向无需插补的不完整微阵列数据聚类

Bioinformatics. 2007 Jan 1;23(1):107-13. doi: 10.1093/bioinformatics/btl555. Epub 2006 Oct 31.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.基于时间变化的微阵列数据聚类：采用TSD预聚类的FCV法

Appl Bioinformatics. 2003;2(1):35-45.

A mixture model with random-effects components for clustering correlated gene-expression profiles.一种具有随机效应成分的混合模型，用于对相关基因表达谱进行聚类。

Bioinformatics. 2006 Jul 15;22(14):1745-52. doi: 10.1093/bioinformatics/btl165. Epub 2006 May 3.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合：一种蒙特卡洛交叉熵方法。

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

引用本文的文献

Front Oncol. 2022 May 24;12:887318. doi: 10.3389/fonc.2022.887318. eCollection 2022.

Sci Rep. 2021 Nov 3;11(1):21609. doi: 10.1038/s41598-021-00678-9.

Evaluation of Twenty Genes in Prognosis of Patients with Ovarian Cancer Using Four Different Clustering Methods.采用四种不同聚类方法评估 20 个基因在卵巢癌患者预后中的作用。

Asian Pac J Cancer Prev. 2021 Jun 1;22(6):1781-1787. doi: 10.31557/APJCP.2021.22.6.1781.

Clustering cancer gene expression data by projective clustering ensemble.通过投影聚类集成对癌症基因表达数据进行聚类

PLoS One. 2017 Feb 24;12(2):e0171429. doi: 10.1371/journal.pone.0171429. eCollection 2017.

Transcriptome signatures in Helicobacter pylori-infected mucosa identifies acidic mammalian chitinase loss as a corpus atrophy marker.转录组特征在幽门螺杆菌感染黏膜中鉴定出酸性哺乳动物壳聚糖酶缺失是萎缩性胃体的标志物。

BMC Med Genomics. 2013 Oct 11;6:41. doi: 10.1186/1755-8794-6-41.

Mapping the polysaccharide degradation potential of Aspergillus niger.解析黑曲霉多糖降解潜力。

BMC Genomics. 2012 Jul 16;13:313. doi: 10.1186/1471-2164-13-313.

Mapping the interaction of Snf1 with TORC1 in Saccharomyces cerevisiae.绘制酿酒酵母中 Snf1 与 TORC1 相互作用的图谱。

Mol Syst Biol. 2011 Nov 8;7:545. doi: 10.1038/msb.2011.80.

A permutation test for determining significance of clusters with applications to spatial and gene expression data.一种用于确定聚类显著性的置换检验及其在空间和基因表达数据中的应用。

Comput Stat Data Anal. 2009 Oct 1;53(12):4290-4300. doi: 10.1016/j.csda.2009.05.031.

Transcriptional regulation of gene expression clusters in motor neurons following spinal cord injury.脊髓损伤后运动神经元中基因表达簇的转录调控。

BMC Genomics. 2010 Jun 9;11:365. doi: 10.1186/1471-2164-11-365.

Proteome analysis of Aspergillus niger: lactate added in starch-containing medium can increase production of the mycotoxin fumonisin B2 by modifying acetyl-CoA metabolism.黑曲霉的蛋白质组分析：在含淀粉的培养基中添加乳酸可以通过改变乙酰辅酶 A 代谢来增加真菌毒素伏马菌素 B2 的产量。

BMC Microbiol. 2009 Dec 10;9:255. doi: 10.1186/1471-2180-9-255.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献