Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, Illinois, United States of America.
PLoS Biol. 2010 Aug 17;8(8):e1000456. doi: 10.1371/journal.pbio.1000456.
Cis-regulatory modules that drive precise spatial-temporal patterns of gene expression are central to the process of metazoan development. We describe a new computational strategy to annotate genomic sequences based on their "pattern generating potential" and to produce quantitative descriptions of transcriptional regulatory networks at the level of individual protein-module interactions. We use this approach to convert the qualitative understanding of interactions that regulate Drosophila segmentation into a network model in which a confidence value is associated with each transcription factor-module interaction. Sequence information from multiple Drosophila species is integrated with transcription factor binding specificities to determine conserved binding site frequencies across the genome. These binding site profiles are combined with transcription factor expression information to create a model to predict module activity patterns. This model is used to scan genomic sequences for the potential to generate all or part of the expression pattern of a nearby gene, obtained from available gene expression databases. Interactions between individual transcription factors and modules are inferred by a statistical method to quantify a factor's contribution to the module's pattern generating potential. We use these pattern generating potentials to systematically describe the location and function of known and novel cis-regulatory modules in the segmentation network, identifying many examples of modules predicted to have overlapping expression activities. Surprisingly, conserved transcription factor binding site frequencies were as effective as experimental measurements of occupancy in predicting module expression patterns or factor-module interactions. Thus, unlike previous module prediction methods, this method predicts not only the location of modules but also their spatial activity pattern and the factors that directly determine this pattern. As databases of transcription factor specificities and in vivo gene expression patterns grow, analysis of pattern generating potentials provides a general method to decode transcriptional regulatory sequences and networks.
顺式调控模块驱动基因表达的精确时空模式,是后生动物发育过程的核心。我们描述了一种新的计算策略,根据它们的“模式生成潜力”来注释基因组序列,并对单个蛋白模块相互作用水平的转录调控网络产生定量描述。我们使用这种方法将调控果蝇分节的相互作用的定性理解转化为一个网络模型,其中每个转录因子-模块相互作用都与置信值相关联。来自多个果蝇物种的序列信息与转录因子结合特异性相结合,以确定整个基因组中保守结合位点的频率。这些结合位点图谱与转录因子表达信息相结合,创建一个模型来预测模块活性模式。该模型用于扫描基因组序列,以预测附近基因的全部或部分表达模式的潜力,这些基因的表达模式可从现有的基因表达数据库中获得。通过统计方法推断单个转录因子和模块之间的相互作用,以量化因子对模块模式生成潜力的贡献。我们使用这些模式生成潜力系统地描述了分节网络中已知和新的顺式调控模块的位置和功能,识别出许多预测具有重叠表达活性的模块的例子。令人惊讶的是,保守转录因子结合位点频率与实验测量的占据率一样,能够有效地预测模块表达模式或因子-模块相互作用。因此,与以前的模块预测方法不同,这种方法不仅预测了模块的位置,还预测了它们的空间活性模式以及直接决定这种模式的因素。随着转录因子特异性和体内基因表达模式数据库的增长,对模式生成潜力的分析提供了一种解码转录调控序列和网络的通用方法。