Yu Xueping, Lin Jimmy, Zack Donald J, Qian Jiang
Wilmer Institute, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA.
BMC Bioinformatics. 2007 Nov 9;8:437. doi: 10.1186/1471-2105-8-437.
Evolutionary conservation has been used successfully to help identify cis-acting DNA regions that are important in regulating tissue-specific gene expression. Motivated by increasing evidence that some DNA regulatory regions are not evolutionary conserved, we have developed an approach for cis-regulatory region identification that does not rely upon evolutionary sequence conservation.
The conservation-independent approach is based on an empirical potential energy between interacting transcription factors (TFs). In this analysis, the potential energy is defined as a function of the number of TF interactions in a genomic region and the strength of the interactions. By identifying sets of interacting TFs, the analysis locates regions enriched with the binding sites of these interacting TFs. We applied this approach to 30 human tissues and identified 6232 putative cis-regulatory modules (CRMs) regulating 2130 tissue-specific genes. Interestingly, some genes appear to be regulated by different CRMs in different tissues. Known regulatory regions are highly enriched in our predicted CRMs. In addition, DNase I hypersensitive sites, which tend to be associated with active regulatory regions, significantly overlap with the predicted CRMs, but not with more conserved regions. We also find that conserved and non-conserved CRMs regulate distinct gene groups. Conserved CRMs control more essential genes and genes involved in fundamental cellular activities such as transcription. In contrast, non-conserved CRMs, in general, regulate more non-essential genes, such as genes related to neural activity.
These results demonstrate that identifying relevant sets of binding motifs can help in the mapping of DNA regulatory regions, and suggest that non-conserved CRMs play an important role in gene regulation.
进化保守性已被成功用于帮助识别在调节组织特异性基因表达中起重要作用的顺式作用DNA区域。鉴于越来越多的证据表明一些DNA调控区域并非进化保守的,我们开发了一种不依赖进化序列保守性的顺式调控区域识别方法。
这种不依赖保守性的方法基于相互作用的转录因子(TFs)之间的经验势能。在该分析中,势能被定义为基因组区域中TF相互作用的数量及其相互作用强度的函数。通过识别相互作用的TF集合,该分析定位了富含这些相互作用TF结合位点的区域。我们将此方法应用于30种人类组织,并识别出6232个推定的顺式调控模块(CRM),它们调控2130个组织特异性基因。有趣的是,一些基因在不同组织中似乎受不同的CRM调控。已知的调控区域在我们预测的CRM中高度富集。此外,倾向于与活跃调控区域相关的DNase I超敏位点与预测的CRM显著重叠,但与更保守的区域不重叠。我们还发现保守和非保守的CRM调控不同的基因组。保守的CRM控制更多必需基因以及参与转录等基本细胞活动的基因。相比之下,一般来说,非保守的CRM调控更多非必需基因,如与神经活动相关的基因。
这些结果表明,识别相关的结合基序集有助于绘制DNA调控区域图谱,并表明非保守的CRM在基因调控中起重要作用。