Wan Raymond, Kiseleva Larisa, Harada Hajime, Mamitsuka Hiroshi, Horton Paul
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, 611-0011, Japan.
Source Code Biol Med. 2009 Nov 20;4:8. doi: 10.1186/1751-0473-4-8.
Visualization tools allow researchers to obtain a global view of the interrelationships between the probes or experiments of a gene expression (e.g. microarray) data set. Some existing methods include hierarchical clustering and k-means. In recent years, others have proposed applying minimum spanning trees (MST) for microarray clustering. Although MST-based clustering is formally equivalent to the dendrograms produced by hierarchical clustering under certain conditions; visually they can be quite different.
HAMSTER (Helpful Abstraction using Minimum Spanning Trees for Expression Relations) is an open source system for generating a set of MSTs from the experiments of a microarray data set. While previous works have generated a single MST from a data set for data clustering, we recursively merge experiments and repeat this process to obtain a set of MSTs for data visualization. Depending on the parameters chosen, each tree is analogous to a snapshot of one step of the hierarchical clustering process. We scored and ranked these trees using one of three proposed schemes. HAMSTER is implemented in C++ and makes use of Graphviz for laying out each MST.
We report on the running time of HAMSTER and demonstrate using data sets from the NCBI Gene Expression Omnibus (GEO) that the images created by HAMSTER offer insights that differ from the dendrograms of hierarchical clustering. In addition to the C++ program which is available as open source, we also provided a web-based version (HAMSTER+) which allows users to apply our system through a web browser without any computer programming knowledge.
Researchers may find it helpful to include HAMSTER in their microarray analysis workflow as it can offer insights that differ from hierarchical clustering. We believe that HAMSTER would be useful for certain types of gradient data sets (e.g time-series data) and data that indicate relationships between cells/tissues. Both the source and the web server variant of HAMSTER are available from http://hamster.cbrc.jp/.
可视化工具使研究人员能够全面了解基因表达(如微阵列)数据集的探针或实验之间的相互关系。一些现有方法包括层次聚类和k均值聚类。近年来,其他人提出将最小生成树(MST)应用于微阵列聚类。虽然基于MST的聚类在某些条件下与层次聚类产生的树状图形式上等效;但在视觉上它们可能有很大不同。
HAMSTER(使用最小生成树进行表达关系的有用抽象)是一个开源系统,用于从微阵列数据集的实验中生成一组MST。虽然以前的工作从数据集中生成单个MST用于数据聚类,但我们递归合并实验并重复此过程以获得一组用于数据可视化的MST。根据所选参数,每棵树类似于层次聚类过程中一个步骤的快照。我们使用三种提议的方案之一对这些树进行评分和排序。HAMSTER用C++实现,并利用Graphviz来布局每个MST。
我们报告了HAMSTER的运行时间,并使用来自NCBI基因表达综合数据库(GEO)的数据集进行演示,结果表明HAMSTER创建的图像提供了与层次聚类树状图不同的见解。除了作为开源提供的C++程序外,我们还提供了一个基于网络的版本(HAMSTER+),允许用户通过网络浏览器应用我们的系统,而无需任何计算机编程知识。
研究人员可能会发现将HAMSTER纳入其微阵列分析工作流程中很有帮助,因为它可以提供与层次聚类不同的见解。我们相信HAMSTER对于某些类型的梯度数据集(如时间序列数据)以及表明细胞/组织之间关系的数据将是有用的。HAMSTER的源代码和网络服务器版本均可从http://hamster.cbrc.jp/获得。