Department of Information and Communications Engineering, KAIST, Daejeon 305-701, South Korea.
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W246-53. doi: 10.1093/nar/gkq516. Epub 2010 Jun 6.
Large microarray data sets have recently become common. However, most available clustering methods do not easily handle large microarray data sets due to their very large computational complexity and memory requirements. Furthermore, typical clustering methods construct oversimplified clusters that ignore subtle but meaningful changes in the expression patterns present in large microarray data sets. It is necessary to develop an efficient clustering method that identifies both absolute expression differences and expression profile patterns in different expression levels for large microarray data sets. This study presents CLIC, which meets the requirements of clustering analysis particularly but not limited to large microarray data sets. CLIC is based on a novel concept in which genes are clustered in individual dimensions first and in which the ordinal labels of clusters in each dimension are then used for further full dimension-wide clustering. CLIC enables iterative sub-clustering into more homogeneous groups and the identification of common expression patterns among the genes separated in different groups due to the large difference in the expression levels. In addition, the computation of clustering is parallelized, the number of clusters is automatically detected, and the functional enrichment for each cluster and pattern is provided. CLIC is freely available at http://gexp2.kaist.ac.kr/clic.
大型微阵列数据集最近变得很常见。然而,由于其非常大的计算复杂性和内存需求,大多数可用的聚类方法不易处理大型微阵列数据集。此外,典型的聚类方法构建过于简化的聚类,忽略了大型微阵列数据集中存在的表达模式中的微妙但有意义的变化。有必要开发一种有效的聚类方法,以识别大型微阵列数据集中的绝对表达差异和不同表达水平下的表达模式。本研究提出了 CLIC,它满足聚类分析的要求,特别是但不限于大型微阵列数据集。CLIC 基于一个新的概念,即首先在单个维度上对基因进行聚类,然后使用每个维度中聚类的有序标签对所有维度进行进一步的聚类。CLIC 能够通过迭代将子聚类到更同质的组中,并识别由于表达水平差异较大而在不同组中分离的基因之间的常见表达模式。此外,聚类的计算是并行化的,自动检测聚类的数量,并为每个聚类和模式提供功能富集。CLIC 可在 http://gexp2.kaist.ac.kr/clic 上免费获得。