Department of Computer Science, The University of Texas at San Antonio, One UTSA Circle, San Antonio, TX 78249, USA.
BMC Genomics. 2013;14 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2164-14-S1-S4. Epub 2013 Jan 21.
Deciphering cis-regulatory networks has become an attractive yet challenging task. This paper presents a simple method for cis-regulatory network discovery which aims to avoid some of the common problems of previous approaches.
Using promoter sequences and gene expression profiles as input, rather than clustering the genes by the expression data, our method utilizes co-expression neighborhood information for each individual gene, thereby overcoming the disadvantages of current clustering based models which may miss specific information for individual genes. In addition, rather than using a motif database as an input, it implements a simple motif count table for each enumerated k-mer for each gene promoter sequence. Thus, it can be used for species where previous knowledge of cis-regulatory motifs is unknown and has the potential to discover new transcription factor binding sites. Applications on Saccharomyces cerevisiae and Arabidopsis have shown that our method has a good prediction accuracy and outperforms a phylogenetic footprinting approach. Furthermore, the top ranked gene-motif regulatory clusters are evidently functionally co-regulated, and the regulatory relationships between the motifs and the enriched biological functions can often be confirmed by literature.
Since this method is simple and gene-specific, it can be readily utilized for insufficiently studied species or flexibly used as an additional step or data source for previous transcription regulatory networks discovery models.
解析顺式调控网络已成为一项极具吸引力但极具挑战性的任务。本文提出了一种简单的顺式调控网络发现方法,旨在避免先前方法的一些常见问题。
该方法使用启动子序列和基因表达谱作为输入,而不是根据表达数据对基因进行聚类,而是利用每个基因的共表达邻域信息,从而克服了当前基于聚类模型的缺点,这些模型可能会错过单个基因的特定信息。此外,它不是将基序数据库用作输入,而是为每个基因启动子序列的每个枚举 k-mer 实现一个简单的基序计数表。因此,它可用于先前不知道顺式调控基序的物种,并有潜力发现新的转录因子结合位点。在酿酒酵母和拟南芥上的应用表明,该方法具有较好的预测准确性,并优于系统发育足迹法。此外,排名靠前的基因-基序调控簇显然是功能上共同调控的,并且基序与富集的生物功能之间的调控关系通常可以通过文献得到证实。
由于该方法简单且针对特定基因,因此可以方便地用于研究不足的物种,或者灵活地用作先前转录调控网络发现模型的附加步骤或数据源。