Department of Biological Sciences, Korea Advanced Institute of Science and Technology, Daejeon, South Korea.
Nucleic Acids Res. 2011 Sep 1;39(17):e116. doi: 10.1093/nar/gkr516. Epub 2011 Jun 30.
The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
全基因组顺式调控模块(CRMs)的鉴定和其相关表观遗传特征的描述是理解基因调控网络的基本步骤。尽管整合分析现有全基因组信息可以提供新的生物学见解,但新方法的缺乏已成为主要的瓶颈。在这里,我们提出了一种名为组合 CRM 解码器(CCD)的全面分析工具,它利用公开的信息来识别和描述感兴趣物种中的全基因组 CRM。CCD 首先定义了一组与一组已知 CRM 显著相关的表观遗传特征作为一个称为“追踪码”的代码,然后使用该追踪码在整个基因组中精确定位潜在的 CRM。使用从 17 个独立的小鼠研究中获得的 61 个全基因组数据集,CCD 成功地编目了大约 12600 个 CRM(五个不同的类别),包括多梳抑制复合物 2 的靶位点和印迹控制区域。有趣的是,我们发现大约 4%的鉴定 CRM 属于至少两个不同的类别,称为“多功能 CRM”,这表明它们对调节时空基因表达的功能重要性。从这些例子中,我们可以看出 CCD 可以应用于任何潜在的全基因组数据集,因此将有助于揭示各种物种的全基因组 CRM。