• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

合并共识聚类评估和改进微阵列数据的分类发现。

Merged consensus clustering to assess and improve class discovery with microarray data.

机构信息

Genes and Development Group, Centre for Integrative Physiology, University of Edinburgh, Hugh Robson Building, George Square, Edinburgh, EH8 9XD, UK.

出版信息

BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.

DOI:10.1186/1471-2105-11-590
PMID:21129181
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3002369/
Abstract

BACKGROUND

One of the most commonly performed tasks when analysing high throughput gene expression data is to use clustering methods to classify the data into groups. There are a large number of methods available to perform clustering, but it is often unclear which method is best suited to the data and how to quantify the quality of the classifications produced.

RESULTS

Here we describe an R package containing methods to analyse the consistency of clustering results from any number of different clustering methods using resampling statistics. These methods allow the identification of the the best supported clusters and additionally rank cluster members by their fidelity within the cluster. These metrics allow us to compare the performance of different clustering algorithms under different experimental conditions and to select those that produce the most reliable clustering structures. We show the application of this method to simulated data, canonical gene expression experiments and our own novel analysis of genes involved in the specification of the peripheral nervous system in the fruitfly, Drosophila melanogaster.

CONCLUSIONS

Our package enables users to apply the merged consensus clustering methodology conveniently within the R programming environment, providing both analysis and graphical display functions for exploring clustering approaches. It extends the basic principle of consensus clustering by allowing the merging of results between different methods to provide an averaged clustering robustness. We show that this extension is useful in correcting for the tendency of clustering algorithms to treat outliers differently within datasets. The R package, clusterCons, is freely available at CRAN and sourceforge under the GNU public licence.

摘要

背景

在分析高通量基因表达数据时,最常执行的任务之一是使用聚类方法将数据分为若干组。有大量的方法可用于执行聚类,但通常不清楚哪种方法最适合数据,以及如何量化分类的质量。

结果

在这里,我们描述了一个 R 包,其中包含使用重采样统计信息分析任意数量的不同聚类方法的聚类结果一致性的方法。这些方法允许识别最受支持的聚类,并且还可以根据其在聚类中的保真度对聚类成员进行排序。这些指标允许我们在不同的实验条件下比较不同聚类算法的性能,并选择那些产生最可靠聚类结构的算法。我们展示了该方法在模拟数据、典型基因表达实验以及我们自己对参与果蝇外周神经系统特化的基因的新分析中的应用。

结论

我们的软件包使用户能够在 R 编程环境中方便地应用合并共识聚类方法,提供用于探索聚类方法的分析和图形显示功能。它通过允许在不同方法之间合并结果来扩展共识聚类的基本原理,以提供平均聚类稳健性。我们表明,这种扩展对于纠正聚类算法在数据集内对待异常值的不同方式是有用的。R 包 clusterCons 可在 CRAN 和sourceforge 上根据 GNU 公共许可证免费获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/f8a57f1f2e56/1471-2105-11-590-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/798927af93c8/1471-2105-11-590-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/57bc3bf9f985/1471-2105-11-590-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/4432a9132d7d/1471-2105-11-590-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/572d48e402d6/1471-2105-11-590-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/9b0da79d53a8/1471-2105-11-590-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/f8a57f1f2e56/1471-2105-11-590-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/798927af93c8/1471-2105-11-590-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/57bc3bf9f985/1471-2105-11-590-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/4432a9132d7d/1471-2105-11-590-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/572d48e402d6/1471-2105-11-590-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/9b0da79d53a8/1471-2105-11-590-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0508/3002369/f8a57f1f2e56/1471-2105-11-590-6.jpg

相似文献

1
Merged consensus clustering to assess and improve class discovery with microarray data.合并共识聚类评估和改进微阵列数据的分类发现。
BMC Bioinformatics. 2010 Dec 3;11:590. doi: 10.1186/1471-2105-11-590.
2
Graph-based consensus clustering for class discovery from gene expression data.基于图的共识聚类用于从基因表达数据中发现类别
Bioinformatics. 2007 Nov 1;23(21):2888-96. doi: 10.1093/bioinformatics/btm463. Epub 2007 Sep 14.
3
FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.FLAME,一种用于分析DNA微阵列数据的新型模糊聚类方法。
BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.
4
Clustering of gene expression data: performance and similarity analysis.基因表达数据的聚类:性能与相似性分析
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.
5
Consensus framework for exploring microarray data using multiple clustering methods.使用多种聚类方法探索微阵列数据的共识框架。
OMICS. 2007 Spring;11(1):116-28. doi: 10.1089/omi.2006.0008.
6
Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.使用一致性算法对大型DNA微阵列数据集进行稳健的多尺度聚类
Bioinformatics. 2006 Jan 1;22(1):58-67. doi: 10.1093/bioinformatics/bti746. Epub 2005 Oct 27.
7
Evaluation of clustering algorithms for gene expression data.基因表达数据聚类算法的评估
BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S17. doi: 10.1186/1471-2105-7-S4-S17.
8
Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach.聚类验证指标的加权排序聚合:一种蒙特卡洛交叉熵方法。
Bioinformatics. 2007 Jul 1;23(13):1607-15. doi: 10.1093/bioinformatics/btm158. Epub 2007 May 5.
9
Simultaneous gene clustering and subset selection for sample classification via MDL.通过最小描述长度实现用于样本分类的同步基因聚类和子集选择
Bioinformatics. 2003 Jun 12;19(9):1100-9. doi: 10.1093/bioinformatics/btg039.
10
NIFTI: an evolutionary approach for finding number of clusters in microarray data.NIFTI:一种用于确定微阵列数据中聚类数量的进化方法。
BMC Bioinformatics. 2009 Jan 30;10:40. doi: 10.1186/1471-2105-10-40.

引用本文的文献

1
BioNAR: an integrated biological network analysis package in bioconductor.BioNAR:生物导体中的一个综合生物网络分析软件包。
Bioinform Adv. 2023 Sep 29;3(1):vbad137. doi: 10.1093/bioadv/vbad137. eCollection 2023.
2
Unsupervised Algorithms for Microarray Sample Stratification.非监督算法在微阵列样本分层中的应用。
Methods Mol Biol. 2022;2401:121-146. doi: 10.1007/978-1-0716-1839-4_9.
3
Dissecting the Shared and Context-Dependent Pathways Mediated by the p140Cap Adaptor Protein in Cancer and in Neurons.剖析由衔接蛋白p140Cap在癌症和神经元中介导的共享及上下文依赖性通路。

本文引用的文献

1
A genomic atlas of mouse hypothalamic development.小鼠下丘脑发育的基因组图谱。
Nat Neurosci. 2010 Jun;13(6):767-75. doi: 10.1038/nn.2545. Epub 2010 May 2.
2
ConsensusCluster: a software tool for unsupervised cluster discovery in numerical data.ConsensusCluster:一种用于数值数据无监督聚类发现的软件工具。
OMICS. 2010 Feb;14(1):109-13. doi: 10.1089/omi.2009.0083.
3
Overview on techniques in cluster analysis.聚类分析技术概述。
Front Cell Dev Biol. 2019 Oct 15;7:222. doi: 10.3389/fcell.2019.00222. eCollection 2019.
4
Regional Diversity in the Postsynaptic Proteome of the Mouse Brain.小鼠大脑突触后蛋白质组中的区域多样性
Proteomes. 2018 Aug 1;6(3):31. doi: 10.3390/proteomes6030031.
5
Synaptic Interactome Mining Reveals p140Cap as a New Hub for PSD Proteins Involved in Psychiatric and Neurological Disorders.突触相互作用组挖掘揭示p140Cap是参与精神疾病和神经疾病的PSD蛋白的新枢纽。
Front Mol Neurosci. 2017 Jun 30;10:212. doi: 10.3389/fnmol.2017.00212. eCollection 2017.
6
Revealing common disease mechanisms shared by tumors of different tissues of origin through semantic representation of genomic alterations and topic modeling.通过基因组改变的语义表示和主题建模揭示不同组织起源肿瘤共有的常见疾病机制。
BMC Genomics. 2017 Mar 14;18(Suppl 2):105. doi: 10.1186/s12864-017-3494-z.
7
Conservation of immune gene signatures in solid tumors and prognostic implications.实体瘤中免疫基因特征的保守性及其预后意义。
BMC Cancer. 2016 Nov 22;16(1):911. doi: 10.1186/s12885-016-2948-z.
8
Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model.使用自动编码器模型学习酵母转录组机制的层次表示。
BMC Bioinformatics. 2016 Jan 11;17 Suppl 1(Suppl 1):9. doi: 10.1186/s12859-015-0852-1.
9
Is there still a French eating model? A taxonomy of eating behaviors in adults living in the Paris metropolitan area in 2010.是否仍存在法国饮食模式?2010年居住在巴黎大都市区成年人的饮食行为分类。
PLoS One. 2015 Mar 3;10(3):e0119161. doi: 10.1371/journal.pone.0119161. eCollection 2015.
10
An alternative to current psychiatric classifications: a psychological landscape hypothesis based on an integrative, dynamical and multidimensional approach.当前精神科分类的替代方案:基于综合、动态和多维方法的心理景观假说。
Philos Ethics Humanit Med. 2014 Jul 17;9:12. doi: 10.1186/1747-5341-9-12.
Methods Mol Biol. 2010;593:81-107. doi: 10.1007/978-1-60327-194-3_5.
4
Transcriptome profiling of human pre-implantation development.人类胚胎植入前发育的转录组分析。
PLoS One. 2009 Nov 16;4(11):e7844. doi: 10.1371/journal.pone.0007844.
5
Comprehensive gene and microRNA expression profiling reveals a role for microRNAs in human liver development.全面的基因和 microRNA 表达谱分析揭示了 microRNAs 在人类肝脏发育中的作用。
PLoS One. 2009 Oct 20;4(10):e7511. doi: 10.1371/journal.pone.0007511.
6
Fuzzy cluster stability analysis with missing values using resampling.
Int J Bioinform Res Appl. 2009;5(2):207-23. doi: 10.1504/IJBRA.2009.024038.
7
A decade of tissue microarrays: progress in the discovery and validation of cancer biomarkers.十年组织微阵列:癌症生物标志物发现与验证的进展
J Clin Oncol. 2008 Dec 1;26(34):5630-7. doi: 10.1200/JCO.2008.17.3567. Epub 2008 Oct 20.
8
Factorial microarray analysis of zebrafish retinal development.斑马鱼视网膜发育的因子微阵列分析。
Proc Natl Acad Sci U S A. 2008 Sep 2;105(35):12909-14. doi: 10.1073/pnas.0806038105. Epub 2008 Aug 27.
9
Clustering approaches to identifying gene expression patterns from DNA microarray data.从DNA微阵列数据中识别基因表达模式的聚类方法。
Mol Cells. 2008 Apr 30;25(2):279-88. Epub 2008 Mar 31.
10
New resampling method for evaluating stability of clusters.用于评估聚类稳定性的新重采样方法。
BMC Bioinformatics. 2008 Jan 24;9:42. doi: 10.1186/1471-2105-9-42.