Gyenesei Attila, Wagner Ulrich, Barkow-Oesterreicher Simon, Stolte Etzard, Schlapbach Ralph
Knowledge and Data Analysis, Unilever Research Vlaardingen, Vlaardingen, The Netherlands.
Bioinformatics. 2007 Aug 1;23(15):1927-35. doi: 10.1093/bioinformatics/btm276. Epub 2007 May 30.
Association pattern discovery (APD) methods have been successfully applied to gene expression data. They find groups of co-regulated genes in which the genes are either up- or down-regulated throughout the identified conditions. These methods, however, fail to identify similarly expressed genes whose expressions change between up- and down-regulation from one condition to another. In order to discover these hidden patterns, we propose the concept of mining co-regulated gene profiles. Co-regulated gene profiles contain two gene sets such that genes within the same set behave identically (up or down) while genes from different sets display contrary behavior. To reduce and group the large number of similar resulting patterns, we propose a new similarity measure that can be applied together with hierarchical clustering methods.
We tested our proposed method on two well-known yeast microarray data sets. Our implementation mined the data effectively and discovered patterns of co-regulated genes that are hidden to traditional APD methods. The high content of biologically relevant information in these patterns is demonstrated by the significant enrichment of co-regulated genes with similar functions. Our experimental results show that the Mining Attribute Profile (MAP) method is an efficient tool for the analysis of gene expression data and competitive with bi-clustering techniques.
关联模式发现(APD)方法已成功应用于基因表达数据。它们发现共调控基因的组,其中这些基因在整个识别出的条件下要么上调要么下调。然而,这些方法无法识别其表达在从一种条件到另一种条件时在上调和下调之间变化的相似表达基因。为了发现这些隐藏模式,我们提出了挖掘共调控基因谱的概念。共调控基因谱包含两个基因集,使得同一集合内的基因表现相同(上调或下调),而来自不同集合的基因表现出相反的行为。为了减少和分组大量相似的结果模式,我们提出了一种新的相似性度量,它可以与层次聚类方法一起应用。
我们在两个著名的酵母微阵列数据集上测试了我们提出的方法。我们的实现有效地挖掘了数据,并发现了传统APD方法所隐藏的共调控基因模式。这些模式中高含量的生物学相关信息通过具有相似功能的共调控基因的显著富集得到证明。我们的实验结果表明,挖掘属性谱(MAP)方法是分析基因表达数据的有效工具,并且与双聚类技术具有竞争力。