Chan Bob Y, Kibler Dennis
School of Information and Computer Science, University of California, Irvine, California, USA.
BMC Bioinformatics. 2005 Oct 27;6:262. doi: 10.1186/1471-2105-6-262.
Cis-regulatory modules (CRMs) are short stretches of DNA that help regulate gene expression in higher eukaryotes. They have been found up to 1 megabase away from the genes they regulate and can be located upstream, downstream, and even within their target genes. Due to the difficulty of finding CRMs using biological and computational techniques, even well-studied regulatory systems may contain CRMs that have not yet been discovered.
We present a simple, efficient method (HexDiff) based only on hexamer frequencies of known CRMs and non-CRM sequence to predict novel CRMs in regulatory systems. On a data set of 16 gap and pair-rule genes containing 52 known CRMs, predictions made by HexDiff had a higher correlation with the known CRMs than several existing CRM prediction algorithms: Ahab, Cluster Buster, MSCAN, MCAST, and LWF. After combining the results of the different algorithms, 10 putative CRMs were identified and are strong candidates for future study. The hexamers used by HexDiff to distinguish between CRMs and non-CRM sequence were also analyzed and were shown to be enriched in regulatory elements.
HexDiff provides an efficient and effective means for finding new CRMs based on known CRMs, rather than known binding sites.
顺式调控模块(CRMs)是一小段DNA,有助于调控高等真核生物中的基因表达。已发现它们距离其所调控的基因可达1兆碱基,并且可位于目标基因的上游、下游,甚至在目标基因内部。由于使用生物学和计算技术寻找CRMs存在困难,即使是研究充分的调控系统可能也包含尚未被发现的CRMs。
我们提出了一种仅基于已知CRMs和非CRM序列的六聚体频率来预测调控系统中新型CRMs的简单、高效方法(HexDiff)。在包含52个已知CRMs的16个缺口基因和成对规则基因的数据集上,HexDiff做出的预测与已知CRMs的相关性高于几种现有的CRM预测算法:Ahab、Cluster Buster、MSCAN、MCAST和LWF。在合并不同算法的结果后,鉴定出了10个推定的CRMs,它们是未来研究的有力候选对象。还对HexDiff用于区分CRMs和非CRM序列的六聚体进行了分析,结果表明这些六聚体富含调控元件。
HexDiff提供了一种基于已知CRMs而非已知结合位点来寻找新CRMs的有效手段。