Belacel Nabil, Cuperlović-Culf Miroslava, Laflamme Mark, Ouellette Rodney
National Research Council Canada, Institute for Information Technology-e-Health group, 127 Carleton Street, St-John, NB, Canada E2L2Z6.
Bioinformatics. 2004 Jul 22;20(11):1690-701. doi: 10.1093/bioinformatics/bth142. Epub 2004 Feb 26.
In the interpretation of gene expression data from a group of microarray experiments that include samples from either different patients or conditions, special consideration must be given to the pleiotropic and epistatic roles of genes, as observed in the variation of gene coexpression patterns. Crisp clustering methods assign each gene to one cluster, thereby omitting information about the multiple roles of genes.
Here, we present the application of a local search heuristic, Fuzzy J-Means, embedded into the variable neighborhood search metaheuristic for the clustering of microarray gene expression data. We show that for all the datasets studied this algorithm outperforms the standard Fuzzy C-Means heuristic. Different methods for the utilization of cluster membership information in determining gene coregulation are presented. The clustering and data analyses were performed on simulated datasets as well as experimental cDNA microarray data for breast cancer and human blood from the Stanford Microarray Database.
The source code of the clustering software (C programming language) is freely available from Nabil.Belacel@nrc-cnrc.gc.ca
在解释来自一组微阵列实验的基因表达数据时,这些实验包括来自不同患者或不同条件的样本,由于基因共表达模式的变化中观察到基因的多效性和上位性作用,必须给予特别考虑。清晰聚类方法将每个基因分配到一个簇中,从而忽略了关于基因多种作用的信息。
在这里,我们展示了一种局部搜索启发式算法——模糊J均值算法,嵌入到可变邻域搜索元启发式算法中用于微阵列基因表达数据的聚类。我们表明,对于所有研究的数据集,该算法优于标准的模糊C均值启发式算法。提出了在确定基因共调控时利用簇隶属信息的不同方法。聚类和数据分析是在模拟数据集以及来自斯坦福微阵列数据库的乳腺癌和人类血液的实验性cDNA微阵列数据上进行的。
聚类软件的源代码(C编程语言)可从Nabil.Belacel@nrc-cnrc.gc.ca免费获取。