Suppr超能文献

合并共识聚类评估和改进微阵列数据的分类发现。

Merged consensus clustering to assess and improve class discovery with microarray data.

机构信息

Genes and Development Group, Centre for Integrative Physiology, University of Edinburgh, Hugh Robson Building, George Square, Edinburgh, EH8 9XD, UK.

出版信息

BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.

Abstract

BACKGROUND

One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced.

RESULTS

Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster.

CONCLUSIONS

Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence.

摘要

背景

在分析高通量基因表达数据时,最常执行的任务之一是使用聚类方法将数据分为若干组。有大量的方法可用于执行聚类,但通常不清楚哪种方法最适合数据,以及如何量化分类的质量。

结果

在这里,我们描述了一个 R 包,其中包含使用重采样统计信息分析任意数量的不同聚类方法的聚类结果一致性的方法。这些方法允许识别最受支持的聚类,并且还可以根据其在聚类中的保真度对聚类成员进行排序。这些指标允许我们在不同的实验条件下比较不同聚类算法的性能,并选择那些产生最可靠聚类结构的算法。我们展示了该方法在模拟数据、典型基因表达实验以及我们自己对参与果蝇外周神经系统特化的基因的新分析中的应用。

结论

我们的软件包使用户能够在 R 编程环境中方便地应用合并共识聚类方法,提供用于探索聚类方法的分析和图形显示功能。它通过允许在不同方法之间合并结果来扩展共识聚类的基本原理,以提供平均聚类稳健性。我们表明,这种扩展对于纠正聚类算法在数据集内对待异常值的不同方式是有用的。R 包 clusterCons 可在 CRAN 和sourceforge 上根据 GNU 公共许可证免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/798927af93c8/1471-2105-11-590-1.jpg

相似文献

1
Merged consensus clustering to assess and improve class discovery with microarray data.
BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.
2
Graph-based consensus clustering for class discovery from gene expression data.
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
3
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.
BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.
4
Clustering of gene expression data: performance and similarity analysis.
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
5
Consensus framework for exploring microarray data using multiple clustering methods.
OMICS. 2007 Spring;11(1):116-28. doi: 10.1089/omi.2006.0008.
6
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
7
Evaluation of clustering algorithms for gene expression data.
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
8
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
9
Simultaneous gene clustering and subset selection for sample classification via MDL.
Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.
10
NIFTI: an evolutionary approach for finding number of clusters in microarray data.
BMC Bioinformatics. 2009 Jan 30;10:40. doi: 10.1186/1471-2105-10-40.

引用本文的文献

1
BioNAR: an integrated biological network analysis package in bioconductor.
Bioinform Adv. 2023 Sep 29;3(1):vbad137. doi: 10.1093/bioadv/vbad137. eCollection 2023.
2
Unsupervised Algorithms for Microarray Sample Stratification.
Methods Mol Biol. 2022;2401:121-146. doi: 10.1007/978-1-0716-1839-4_9.
3
Dissecting the Shared and Context-Dependent Pathways Mediated by the p140Cap Adaptor Protein in Cancer and in Neurons.
Front Cell Dev Biol. 2019 Oct 15;7:222. doi: 10.3389/fcell.2019.00222. eCollection 2019.
4
Regional Diversity in the Postsynaptic Proteome of the Mouse Brain.
Proteomes. 2018 Aug 1;6(3):31. doi: 10.3390/proteomes6030031.
5
Synaptic Interactome Mining Reveals p140Cap as a New Hub for PSD Proteins Involved in Psychiatric and Neurological Disorders.
Front Mol Neurosci. 2017 Jun 30;10:212. doi: 10.3389/fnmol.2017.00212. eCollection 2017.
7
Conservation of immune gene signatures in solid tumors and prognostic implications.
BMC Cancer. 2016 Nov 22;16(1):911. doi: 10.1186/s12885-016-2948-z.
8
Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model.
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):9. doi: 10.1186/s12859-015-0852-1.
9
Is there still a French eating model? A taxonomy of eating behaviors in adults living in the Paris metropolitan area in 2010.
PLoS One. 2015 Mar 3;10(3):e0119161. doi: 10.1371/journal.pone.0119161. eCollection 2015.

本文引用的文献

1
A genomic atlas of mouse hypothalamic development.
Nat Neurosci. 2010 Jun;13(6):767-75. doi: 10.1038/nn.2545. Epub 2010 May 2.
2
3
Overview on techniques in cluster analysis.
Methods Mol Biol. 2010;593:81-107. doi: 10.1007/978-1-60327-194-3_5.
4
Transcriptome profiling of human pre-implantation development.
PLoS One. 2009 Nov 16;4(11):e7844. doi: 10.1371/journal.pone.0007844.
6
Fuzzy cluster stability analysis with missing values using resampling.
Int J Bioinform Res Appl. 2009;5(2):207-23. doi: 10.1504/IJBRA.2009.024038.
7
A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers.
J Clin Oncol. 2008 Dec 1;26(34):5630-7. doi: 10.1200/JCO.2008.17.3567. Epub 2008 Oct 20.
8
Factorial microarray analysis of zebrafish retinal development.
Proc Natl Acad Sci U S A. 2008 Sep 2;105(35):12909-14. doi: 10.1073/pnas.0806038105. Epub 2008 Aug 27.
10
New resampling method for evaluating stability of clusters.
BMC Bioinformatics. 2008 Jan 24;9:42. doi: 10.1186/1471-2105-9-42.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验