Nam Hojung, Lee KiYoung, Lee Doheon
Department of Bio and Brain Engineering, KAIST, 373-1 Guseong-dong, Yuseong-gu, Daejeon, Korea.
BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-10-S3-S6.
One of the most challenging problems in mining gene expression data is to identify how the expression of any particular gene affects the expression of other genes. To elucidate the relationships between genes, an association rule mining (ARM) method has been applied to microarray gene expression data. However, a conventional ARM method has a limit on extracting temporal dependencies between gene expressions, though the temporal information is indispensable to discover underlying regulation mechanisms in biological pathways. In this paper, we propose a novel method, referred to as temporal association rule mining (TARM), which can extract temporal dependencies among related genes. A temporal association rule has the form [gene A upward arrow, gene B downward arrow] --> (7 min) [gene C upward arrow], which represents that high expression level of gene A and significant repression of gene B followed by significant expression of gene C after 7 minutes. The proposed TARM method is tested with Saccharomyces cerevisiae cell cycle time-series microarray gene expression data set.
In the parameter fitting phase of TARM, the fitted parameter set [threshold = +/- 0.8, support >or= 3 transactions, confidence >or= 90%] with the best precision score for KEGG cell cycle pathway has been chosen for rule mining phase. With the fitted parameter set, numbers of temporal association rules with five transcriptional time delays (0, 7, 14, 21, 28 minutes) are extracted from gene expression data of 799 genes, which are pre-identified cell cycle relevant genes. From the extracted temporal association rules, associated genes, which play same role of biological processes within short transcriptional time delay and some temporal dependencies between genes with specific biological processes are identified.
In this work, we proposed TARM, which is an applied form of conventional ARM. TARM showed higher precision score than Dynamic Bayesian network and Bayesian network. Advantages of TARM are that it tells us the size of transcriptional time delay between associated genes, activation and inhibition relationship between genes, and sets of co-regulators.
挖掘基因表达数据中最具挑战性的问题之一是确定任何特定基因的表达如何影响其他基因的表达。为了阐明基因之间的关系,关联规则挖掘(ARM)方法已应用于微阵列基因表达数据。然而,传统的ARM方法在提取基因表达之间的时间依赖性方面存在局限性,尽管时间信息对于发现生物途径中的潜在调控机制是不可或缺的。在本文中,我们提出了一种新颖的方法,称为时间关联规则挖掘(TARM),它可以提取相关基因之间的时间依赖性。一个时间关联规则具有形式[基因A向上箭头,基因B向下箭头] --> (7分钟)[基因C向上箭头],这表示基因A的高表达水平和基因B的显著抑制,随后在7分钟后基因C显著表达。所提出的TARM方法用酿酒酵母细胞周期时间序列微阵列基因表达数据集进行了测试。
在TARM的参数拟合阶段,已选择具有KEGG细胞周期途径最佳精度得分的拟合参数集[阈值 = +/- 0.8,支持度 >= 3个事务,置信度 >= 90%]用于规则挖掘阶段。使用拟合参数集,从799个基因的基因表达数据中提取了具有五个转录时间延迟(0、7、14、21、28分钟)的时间关联规则数量,这些基因是预先确定的与细胞周期相关的基因。从提取的时间关联规则中,识别出在短转录时间延迟内发挥相同生物学过程作用的相关基因以及具有特定生物学过程的基因之间的一些时间依赖性。
在这项工作中,我们提出了TARM,它是传统ARM的一种应用形式。TARM显示出比动态贝叶斯网络和贝叶斯网络更高的精度得分。TARM的优点在于它告诉我们相关基因之间转录时间延迟的大小、基因之间的激活和抑制关系以及共调节因子集。