Suppr超能文献

用于微阵列数据聚类和可视化的二叉树结构向量量化方法。

Binary tree-structured vector quantization approach to clustering and visualizing microarray data.

作者信息

Sultan M, Wigle D A, Cumbaa C A, Maziarz M, Glasgow J, Tsao M S, Jurisica I

机构信息

Division of Cancer Informatics, Ontario Cancer Institute, 610 University Avenue, Toronto, Ontario, M5G 2M9, Canada.

出版信息

Bioinformatics. 2002;18 Suppl 1:S111-9. doi: 10.1093/bioinformatics/18.suppl_1.s111.

Abstract

MOTIVATION

With the increasing number of gene expression databases, the need for more powerful analysis and visualization tools is growing. Many techniques have successfully been applied to unravel latent similarities among genes and/or experiments. Most of the current systems for microarray data analysis use statistical methods, hierarchical clustering, self-organizing maps, support vector machines, or k-means clustering to organize genes or experiments into 'meaningful' groups. Without prior explicit bias almost all of these clustering methods applied to gene expression data not only produce different results, but may also produce clusters with little or no biological relevance. Of these methods, agglomerative hierarchical clustering has been the most widely applied, although many limitations have been identified.

RESULTS

Starting with a systematic comparison of the underlying theories behind clustering approaches, we have devised a technique that combines tree-structured vector quantization and partitive k-means clustering (BTSVQ). This hybrid technique has revealed clinically relevant clusters in three large publicly available data sets. In contrast to existing systems, our approach is less sensitive to data preprocessing and data normalization. In addition, the clustering results produced by the technique have strong similarities to those of self-organizing maps (SOMs). We discuss the advantages and the mathematical reasoning behind our approach.

摘要

动机

随着基因表达数据库数量的不断增加,对更强大的分析和可视化工具的需求也在增长。许多技术已成功应用于揭示基因和/或实验之间潜在的相似性。当前大多数微阵列数据分析系统使用统计方法、层次聚类、自组织映射、支持向量机或k均值聚类将基因或实验组织成“有意义”的组。在没有事先明确偏向的情况下,几乎所有应用于基因表达数据的这些聚类方法不仅会产生不同的结果,还可能产生与生物学相关性很小或没有生物学相关性的聚类。在这些方法中,凝聚层次聚类应用最为广泛,尽管已发现许多局限性。

结果

从对聚类方法背后的基础理论进行系统比较开始,我们设计了一种结合树状结构向量量化和划分k均值聚类的技术(BTSVQ)。这种混合技术在三个大型公开可用数据集中揭示了与临床相关的聚类。与现有系统相比,我们的方法对数据预处理和数据归一化不太敏感。此外,该技术产生的聚类结果与自组织映射(SOM)的结果有很强的相似性。我们讨论了我们方法的优点和数学原理。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验