Laderas Ted, McWeeney Shannon
Informatics Shared Resource, OHSU Cancer Institute, Portland, Oregon 97201, USA.
OMICS. 2007 Spring;11(1):116-28. doi: 10.1089/omi.2006.0008.
The large variety of clustering algorithms and their variants can be daunting to researchers wishing to explore patterns within their microarray datasets. Furthermore, each clustering method has distinct biases in finding patterns within the data, and clusterings may not be reproducible across different algorithms. A consensus approach utilizing multiple algorithms can show where the various methods agree and expose robust patterns within the data. In this paper, we present a software package - Consense, written for R/Bioconductor - that utilizes such an approach to explore microarray datasets. Consense produces clustering results for each of the clustering methods and produces a report of metrics comparing the individual clusterings. A feature of Consense is identification of genes that cluster consistently with an index gene across methods. Utilizing simulated microarray data, sensitivity of the metrics to the biases of the different clustering algorithms is explored. The framework is easily extensible, allowing this tool to be used by other functional genomic data types, as well as other high-throughput OMICS data types generated from metabolomic and proteomic experiments. It also provides a flexible environment to benchmark new clustering algorithms. Consense is currently available as an installable R/Bioconductor package (http://www.ohsucancer.com/isrdev/consense/).
对于希望在其微阵列数据集中探索模式的研究人员而言,种类繁多的聚类算法及其变体可能令人望而生畏。此外,每种聚类方法在数据中寻找模式时都有明显的偏差,并且不同算法之间的聚类结果可能无法重现。利用多种算法的共识方法可以显示各种方法的一致之处,并揭示数据中的稳健模式。在本文中,我们展示了一个为R/Bioconductor编写的软件包——Consense,它利用这种方法来探索微阵列数据集。Consense为每种聚类方法生成聚类结果,并生成一份比较各个聚类的指标报告。Consense的一个特点是识别出在各种方法中与索引基因一致聚类的基因。利用模拟微阵列数据,探索了这些指标对不同聚类算法偏差的敏感性。该框架易于扩展,允许此工具用于其他功能基因组数据类型,以及代谢组学和蛋白质组学实验产生的其他高通量组学数据类型。它还提供了一个灵活的环境来对新的聚类算法进行基准测试。Consense目前可作为一个可安装的R/Bioconductor包获取(http://www.ohsucancer.com/isrdev/consense/)。