Berman Benjamin P, Nibu Yutaka, Pfeiffer Barret D, Tomancak Pavel, Celniker Susan E, Levine Michael, Rubin Gerald M, Eisen Michael B
Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA.
Proc Natl Acad Sci U S A. 2002 Jan 22;99(2):757-62. doi: 10.1073/pnas.231608898.
A major challenge in interpreting genome sequences is understanding how the genome encodes the information that specifies when and where a gene will be expressed. The first step in this process is the identification of regions of the genome that contain regulatory information. In higher eukaryotes, this cis-regulatory information is organized into modular units [cis-regulatory modules (CRMs)] of a few hundred base pairs. A common feature of these cis-regulatory modules is the presence of multiple binding sites for multiple transcription factors. Here, we evaluate the extent to which the tendency for transcription factor binding sites to be clustered can be used as the basis for the computational identification of cis-regulatory modules. By using published DNA binding specificity data for five transcription factors active in the early Drosophila embryo, we identified genomic regions containing unusually high concentrations of predicted binding sites for these factors. A significant fraction of these binding site clusters overlap known CRMs that are regulated by these factors. In addition, many of the remaining clusters are adjacent to genes expressed in a pattern characteristic of genes regulated by these factors. We tested one of the newly identified clusters, mapping upstream of the gap gene giant (gt), and show that it acts as an enhancer that recapitulates the posterior expression pattern of gt.
解读基因组序列的一个主要挑战在于理解基因组如何编码指定基因何时何地表达的信息。这一过程的第一步是识别基因组中包含调控信息的区域。在高等真核生物中,这种顺式调控信息被组织成几百个碱基对的模块化单元[顺式调控模块(CRMs)]。这些顺式调控模块的一个共同特征是存在多个转录因子的多个结合位点。在此,我们评估转录因子结合位点的聚类倾向可在多大程度上用作顺式调控模块计算识别的基础。通过使用已发表的在早期果蝇胚胎中活跃的五种转录因子的DNA结合特异性数据,我们识别出了包含这些因子预测结合位点异常高浓度的基因组区域。这些结合位点簇中有很大一部分与受这些因子调控的已知CRMs重叠。此外,许多其余的簇与以这些因子调控的基因特征模式表达的基因相邻。我们测试了新识别出的一个簇,它位于间隙基因giant(gt)的上游,并表明它作为一个增强子重现了gt的后部表达模式。