Carmona-Saez Pedro, Chagoyen Monica, Rodriguez Andres, Trelles Oswaldo, Carazo Jose M, Pascual-Montano Alberto
BioComputing Unit, National Center for Biotechnology (CNB-CSIC), Cantoblanco, 28049, Madrid, Spain.
BMC Bioinformatics. 2006 Feb 7;7:54. doi: 10.1186/1471-2105-7-54.
Microarray technology is generating huge amounts of data about the expression level of thousands of genes, or even whole genomes, across different experimental conditions. To extract biological knowledge, and to fully understand such datasets, it is essential to include external biological information about genes and gene products to the analysis of expression data. However, most of the current approaches to analyze microarray datasets are mainly focused on the analysis of experimental data, and external biological information is incorporated as a posterior process.
In this study we present a method for the integrative analysis of microarray data based on the Association Rules Discovery data mining technique. The approach integrates gene annotations and expression data to discover intrinsic associations among both data sources based on co-occurrence patterns. We applied the proposed methodology to the analysis of gene expression datasets in which genes were annotated with metabolic pathways, transcriptional regulators and Gene Ontology categories. Automatically extracted associations revealed significant relationships among these gene attributes and expression patterns, where many of them are clearly supported by recently reported work.
The integration of external biological information and gene expression data can provide insights about the biological processes associated to gene expression programs. In this paper we show that the proposed methodology is able to integrate multiple gene annotations and expression data in the same analytic framework and extract meaningful associations among heterogeneous sources of data. An implementation of the method is included in the Engene software package.
微阵列技术正在生成大量关于数千个基因甚至整个基因组在不同实验条件下的表达水平的数据。为了提取生物学知识并充分理解此类数据集,在表达数据分析中纳入有关基因和基因产物的外部生物学信息至关重要。然而,当前大多数分析微阵列数据集的方法主要集中在实验数据的分析上,外部生物学信息是作为后续过程纳入的。
在本研究中,我们提出了一种基于关联规则发现数据挖掘技术的微阵列数据综合分析方法。该方法整合基因注释和表达数据,以基于共现模式发现两个数据源之间的内在关联。我们将所提出的方法应用于基因表达数据集的分析,其中基因用代谢途径、转录调节因子和基因本体类别进行注释。自动提取的关联揭示了这些基因属性与表达模式之间的显著关系,其中许多关系得到了最近报道的工作的明确支持。
外部生物学信息与基因表达数据的整合可以提供有关与基因表达程序相关的生物学过程的见解。在本文中,我们表明所提出的方法能够在同一分析框架中整合多个基因注释和表达数据,并从异质数据源中提取有意义的关联。该方法的一个实现包含在Engene软件包中。