Xie Dan, Cai Jun, Chia Na-Yu, Ng Huck H, Zhong Sheng
Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.
Genome Res. 2008 Aug;18(8):1325-35. doi: 10.1101/gr.072769.107. Epub 2008 May 15.
We introduce the GibbsModule algorithm for de novo detection of cis-regulatory motifs and modules in eukaryote genomes. GibbsModule models the coexpressed genes within one species as sharing a core cis-regulatory motif and each homologous gene group as sharing a homologous cis-regulatory module (CRM), characterized by a similar composition of motifs. Without using a predetermined alignment result, GibbsModule iteratively updates the core motif shared by coexpressed genes and traces the homologous CRMs that contain the core motif. GibbsModule achieved substantial improvements in both precision and recall as compared with peer algorithms on a number of synthetic and real data sets. Applying GibbsModule to analyze the binding regions of the Krüppel-like factor (KLF) transcription factor in embryonic stem cells (ESCs), we discovered a motif that differs from a previously published KLF motif identified by a SELEX experiment, but the new motif is consistent with mutagenesis analysis. The SOX2 motif was found to be a collaborating motif to the KLF motif in ESCs. We used quantitative chromatin immunoprecipitation (ChIP) analysis to test whether GibbsModule could distinguish functional and nonfunctional binding sites. All seven tested binding sites in GibbsModule-predicted CRMs had higher ChIP signals as compared with the other seven tested binding sites located outside of predicted CRMs. GibbsModule is available at (http://biocomp.bioen.uiuc.edu/GibbsModule).
我们介绍了用于在真核生物基因组中从头检测顺式调控基序和模块的GibbsModule算法。GibbsModule将一个物种内共表达的基因建模为共享一个核心顺式调控基序,将每个同源基因组建模为共享一个同源顺式调控模块(CRM),其特征是基序组成相似。在不使用预先确定的比对结果的情况下,GibbsModule迭代更新共表达基因共享的核心基序,并追踪包含该核心基序的同源CRM。与许多合成数据集和真实数据集上的同类算法相比,GibbsModule在精度和召回率方面都有显著提高。将GibbsModule应用于分析胚胎干细胞(ESC)中Krüppel样因子(KLF)转录因子的结合区域,我们发现了一个与之前通过SELEX实验鉴定的KLF基序不同的基序,但新基序与诱变分析一致。发现SOX2基序是ESC中KLF基序的协作基序。我们使用定量染色质免疫沉淀(ChIP)分析来测试GibbsModule是否能够区分功能性和非功能性结合位点。与位于预测的CRM之外的其他七个测试结合位点相比,GibbsModule预测的CRM中的所有七个测试结合位点都有更高的ChIP信号。可在(http://biocomp.bioen.uiuc.edu/GibbsModule)获取GibbsModule。