一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

作者信息

Torrente Aurora, Kapushesky Misha, Brazma Alvis

机构信息

EMBL Outstation-Hinxton, European Bioinformatics Institute Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

出版信息

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

DOI:10.1093/bioinformatics/bti644

PMID:16141251

Abstract

MOTIVATION

Clustering is one of the most widely used methods in unsupervised gene expression data analysis. The use of different clustering algorithms or different parameters often produces rather different results on the same data. Biological interpretation of multiple clustering results requires understanding how different clusters relate to each other. It is particularly non-trivial to compare the results of a hierarchical and a flat, e.g. k-means, clustering.

RESULTS

We present a new method for comparing and visualizing relationships between different clustering results, either flat versus flat, or flat versus hierarchical. When comparing a flat clustering to a hierarchical clustering, the algorithm cuts different branches in the hierarchical tree at different levels to optimize the correspondence between the clusters. The optimization function is based on graph layout aesthetics or on mutual information. The clusters are displayed using a bipartite graph where the edges are weighted proportionally to the number of common elements in the respective clusters and the weighted number of crossings is minimized. The performance of the algorithm is tested using simulated and real gene expression data. The algorithm is implemented in the online gene expression data analysis tool Expression Profiler.

AVAILABILITY

http://www.ebi.ac.uk/expressionprofiler

摘要

动机

聚类是无监督基因表达数据分析中使用最广泛的方法之一。使用不同的聚类算法或不同参数通常会对相同数据产生截然不同的结果。对多个聚类结果进行生物学解释需要理解不同聚类之间的关系。比较层次聚类和平坦聚类（例如k均值聚类）的结果尤其具有挑战性。

结果

我们提出了一种用于比较和可视化不同聚类结果之间关系的新方法，这些结果可以是平坦聚类与平坦聚类之间，也可以是平坦聚类与层次聚类之间。在将平坦聚类与层次聚类进行比较时，该算法在层次树的不同级别切割不同分支，以优化聚类之间的对应关系。优化函数基于图形布局美学或互信息。聚类使用二分图显示，其中边的权重与相应聚类中共同元素的数量成比例，并且加权交叉数最小化。使用模拟和真实基因表达数据测试了该算法的性能。该算法在在线基因表达数据分析工具Expression Profiler中实现。

可用性

http://www.ebi.ac.uk/expressionprofiler

相似文献

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.

Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.

Hierarchical tree snipping: clustering guided by prior knowledge.

Bioinformatics. 2007 Dec 15;23(24):3335-42. doi: 10.1093/bioinformatics/btm526. Epub 2007 Nov 7.

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.

Microarray data clustering based on temporal variation: FCV with TSD preclustering.

Appl Bioinformatics. 2003;2(1):35-45.

Graph-based consensus clustering for class discovery from gene expression data.

Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.

Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes: detecting varying patterns in expression profiles.

Bioinformatics. 2008 Jun 1;24(11):1359-66. doi: 10.1093/bioinformatics/btn133. Epub 2008 Apr 10.

LCE: a link-based cluster ensemble method for improved gene expression data analysis.

Bioinformatics. 2010 Jun 15;26(12):1513-9. doi: 10.1093/bioinformatics/btq226. Epub 2010 May 5.

Analysis of a Gibbs sampler method for model-based clustering of gene expression data.

Bioinformatics. 2008 Jan 15;24(2):176-83. doi: 10.1093/bioinformatics/btm562. Epub 2007 Nov 22.

引用本文的文献

clustComp, a bioconductor package for the comparison of clustering results.

Bioinformatics. 2017 Dec 15;33(24):4001-4003. doi: 10.1093/bioinformatics/btx532.

Clustering of High Throughput Gene Expression Data.

Comput Oper Res. 2012 Dec;39(12):3046-3061. doi: 10.1016/j.cor.2012.03.008.

Gene selection and classification for cancer microarray data based on machine learning and similarity measures.

BMC Genomics. 2011 Dec 23;12 Suppl 5(Suppl 5):S1. doi: 10.1186/1471-2164-12-S5-S1.

A CD8+ T cell transcription signature predicts prognosis in autoimmune disease.

Nat Med. 2010 May;16(5):586-91, 1p following 591. doi: 10.1038/nm.2130. Epub 2010 Apr 18.

Feature selection and classification of MAQC-II breast cancer and multiple myeloma microarray gene expression data.

PLoS One. 2009 Dec 11;4(12):e8250. doi: 10.1371/journal.pone.0008250.

VisHiC--hierarchical functional enrichment analysis of microarray data.

Nucleic Acids Res. 2009 Jul;37(Web Server issue):W587-92. doi: 10.1093/nar/gkp435. Epub 2009 May 29.

Global considerations in hierarchical clustering reveal meaningful patterns in data.

PLoS One. 2008 May 21;3(5):e2247. doi: 10.1371/journal.pone.0002247.

A general strategy to determine the congruence between a hierarchical and a non-hierarchical classification.

BMC Bioinformatics. 2007 Nov 15;8:442. doi: 10.1186/1471-2105-8-442.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种用于比较和可视化层次化与平面化基因表达数据聚类之间关系的新算法。

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献