Sharan R, Shamir R
Department of Computer Science, Tel-Aviv University, Israel.
Proc Int Conf Intell Syst Mol Biol. 2000;8:307-16.
Novel DNA microarray technologies enable the monitoring of expression levels of thousands of genes simultaneously. This allows a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. Analyzing gene expression data requires the clustering of genes into groups with similar expression patterns. We have developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications. No prior assumptions are made on the structure or the number of the clusters. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups of highly similar elements (kernels), which are likely to belong to the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clustering. CLICK has been implemented and tested on a variety of biological datasets, ranging from gene expression, cDNA oligo-fingerprinting to protein sequence similarity. In all those applications it outperformed extant algorithms according to several common figures of merit. CLICK is also very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation.
新型DNA微阵列技术能够同时监测数千个基因的表达水平。这使得在细胞经历特定条件或过程时,可以全面了解许多(或所有)基因的转录水平。分析基因表达数据需要将基因聚类为具有相似表达模式的组。我们开发了一种名为CLICK的新型聚类算法,该算法适用于基因表达分析以及其他生物学应用。对于聚类的结构或数量不做任何先验假设。该算法利用图论和统计技术来识别高度相似元素(核心)的紧密组,这些元素可能属于同一个真实聚类。然后使用几种启发式程序将核心扩展为完整的聚类。CLICK已在各种生物学数据集上实现并进行了测试,范围从基因表达、cDNA寡核苷酸指纹图谱到蛋白质序列相似性。在所有这些应用中,根据几个常见的评估指标,它都优于现有算法。CLICK也非常快,在普通工作站上,几分钟内就能对数千个元素进行聚类,几个小时内就能对超过10万个元素进行聚类。