Narlikar Leelavati, Gordân Raluca, Hartemink Alexander J
Department of Computer Science, Duke University, Durham, North Carolina, United States of America.
PLoS Comput Biol. 2007 Nov;3(11):e215. doi: 10.1371/journal.pcbi.0030215. Epub 2007 Sep 24.
Finding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known TF motifs occur in the genome than are actually functional. However, information about chromatin structure may help to identify the functional sites. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling TFs to bind DNA in those regions. Here, we describe a novel motif discovery algorithm that employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy. When a Gibbs sampling algorithm is applied to yeast sequence-sets identified by ChIP-chip, the correct motif is found in 52% more cases with our informative prior than with the commonly used uniform prior. This is the first demonstration that nucleosome occupancy information can be used to improve motif discovery. The improvement is dramatic, even though we are using only a statistical model to predict nucleosome occupancy; we expect our results to improve further as high-resolution genome-wide experimental nucleosome occupancy data becomes increasingly available.
在全基因组范围内寻找转录因子(TFs)的功能性DNA结合位点是理解转录调控的关键一步。不幸的是,这些结合位点通常很短且具有简并性,这带来了巨大的统计挑战:基因组中与已知TF基序的匹配数远多于实际的功能位点。然而,染色质结构信息可能有助于识别功能位点。特别是,已有研究表明活跃的调控区域通常缺乏核小体,从而使TFs能够在这些区域结合DNA。在此,我们描述了一种新颖的基序发现算法,该算法基于对核小体占有率的判别观点,对DNA序列位置采用信息性先验。当将吉布斯采样算法应用于通过芯片杂交免疫沉淀(ChIP-chip)鉴定的酵母序列集时,与常用的均匀先验相比,使用我们的信息性先验在更多情况下(多52%)找到了正确的基序。这是首次证明核小体占有率信息可用于改进基序发现。尽管我们仅使用统计模型来预测核小体占有率,改进效果仍很显著;随着全基因组高分辨率实验核小体占有率数据越来越多,我们预计我们的结果会进一步改善。