用于微阵列数据聚类和可视化的二叉树结构向量量化方法。

Binary tree-structured vector quantization approach to clustering and visualizing microarray data.

作者信息

Sultan M, Wigle D A, Cumbaa C A, Maziarz M, Glasgow J, Tsao M S, Jurisica I

机构信息

Division of Cancer Informatics, Ontario Cancer Institute, 610 University Avenue, Toronto, Ontario, M5G 2M9, Canada.

出版信息

Bioinformatics. 2002;18 Suppl 1:S111-9. doi: 10.1093/bioinformatics/18.suppl_1.s111.

DOI:10.1093/bioinformatics/18.suppl_1.s111

PMID:12169538

Abstract

MOTIVATION

With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified.

RESULTS

Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.

摘要

动机

随着基因表达数据库数量的不断增加，对更强大的分析和可视化工具的需求也在增长。许多技术已成功应用于揭示基因和/或实验之间潜在的相似性。当前大多数微阵列数据分析系统使用统计方法、层次聚类、自组织映射、支持向量机或k均值聚类将基因或实验组织成“有意义”的组。在没有事先明确偏向的情况下，几乎所有应用于基因表达数据的这些聚类方法不仅会产生不同的结果，还可能产生与生物学相关性很小或没有生物学相关性的聚类。在这些方法中，凝聚层次聚类应用最为广泛，尽管已发现许多局限性。

结果

从对聚类方法背后的基础理论进行系统比较开始，我们设计了一种结合树状结构向量量化和划分k均值聚类的技术（BTSVQ）。这种混合技术在三个大型公开可用数据集中揭示了与临床相关的聚类。与现有系统相比，我们的方法对数据预处理和数据归一化不太敏感。此外，该技术产生的聚类结果与自组织映射（SOM）的结果有很强的相似性。我们讨论了我们方法的优点和数学原理。

相似文献

Binary tree-structured vector quantization approach to clustering and visualizing microarray data.

Bioinformatics. 2002;18 Suppl 1:S111-9. doi: 10.1093/bioinformatics/18.suppl_1.s111.

An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data.

Bioinformatics. 2003 Nov 1;19(16):2131-40. doi: 10.1093/bioinformatics/btg296.

A new algorithm for comparing and visualizing relationships between hierarchical and flat gene expression data clusterings.

Bioinformatics. 2005 Nov 1;21(21):3993-9. doi: 10.1093/bioinformatics/bti644. Epub 2005 Sep 1.

Co-clustering and visualization of gene expression data and gene ontology terms for Saccharomyces cerevisiae using self-organizing maps.

J Biomed Inform. 2007 Apr;40(2):160-73. doi: 10.1016/j.jbi.2006.05.001. Epub 2006 May 20.

Detecting clusters of different geometrical shapes in microarray gene expression data.

Bioinformatics. 2005 May 1;21(9):1927-34. doi: 10.1093/bioinformatics/bti251. Epub 2005 Jan 12.

Clustering of gene expression data: performance and similarity analysis.

BMC Bioinformatics. 2006 Dec 12;7 Suppl 4(Suppl 4):S19. doi: 10.1186/1471-2105-7-S4-S19.

A dynamically growing self-organizing tree (DGSOT) for hierarchical clustering gene expression profiles.

Bioinformatics. 2004 Nov 1;20(16):2605-17. doi: 10.1093/bioinformatics/bth292. Epub 2004 May 6.

FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data.

BMC Bioinformatics. 2007 Jan 4;8:3. doi: 10.1186/1471-2105-8-3.

Effect of data normalization on fuzzy clustering of DNA microarray data.

BMC Bioinformatics. 2006 Mar 14;7:134. doi: 10.1186/1471-2105-7-134.

Evaluation and comparison of gene clustering methods in microarray analysis.

Bioinformatics. 2006 Oct 1;22(19):2405-12. doi: 10.1093/bioinformatics/btl406. Epub 2006 Jul 31.

引用本文的文献

From eHealth to iHealth: Transition to Participatory and Personalized Medicine in Mental Health.

J Med Internet Res. 2018 Jan 3;20(1):e2. doi: 10.2196/jmir.7412.

Quantitative analysis of mammalian translation initiation sites by FACS-seq.

Mol Syst Biol. 2014 Aug 28;10(8):748. doi: 10.15252/msb.20145136.

Knowledge Discovery and interactive Data Mining in Bioinformatics--State-of-the-Art, future challenges and research directions.

BMC Bioinformatics. 2014;15 Suppl 6(Suppl 6):I1. doi: 10.1186/1471-2105-15-S6-I1. Epub 2014 May 16.

KNODWAT: a scientific framework application for testing knowledge discovery methods for the biomedical domain.

BMC Bioinformatics. 2013 Jun 13;14:191. doi: 10.1186/1471-2105-14-191.

Applications of microarray technology to Acute Myelogenous Leukemia.

Cancer Inform. 2009;7:13-28. doi: 10.4137/cin.s1015. Epub 2008 Dec 22.

Molecular evidence of placental hypoxia in preeclampsia.

J Clin Endocrinol Metab. 2005 Jul;90(7):4299-308. doi: 10.1210/jc.2005-0078. Epub 2005 Apr 19.

Stability and heterogeneity of expression profiles in lung cancer specimens harvested following surgical resection.

Neoplasia. 2004 Nov-Dec;6(6):761-7. doi: 10.1593/neo.04301.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于微阵列数据聚类和可视化的二叉树结构向量量化方法。

Binary tree-structured vector quantization approach to clustering and visualizing microarray data.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

动机

结果

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献