Zhao Guoyan, Schriefer Lawrence A, Stormo Gary D
Department of Genetics, Washington University School of Medicine, St. Louis, Missouri 63110, USA.
Genome Res. 2007 Mar;17(3):348-57. doi: 10.1101/gr.5989907. Epub 2007 Feb 6.
Transcriptional regulation is the major regulatory mechanism that controls the spatial and temporal expression of genes during development. This is carried out by transcription factors (TFs), which recognize and bind to their cognate binding sites. Recent studies suggest a modular organization of TF-binding sites, in which clusters of transcription-factor binding sites cooperate in the regulation of downstream gene expression. In this study, we report our computational identification and experimental verification of muscle-specific cis-regulatory modules in Caenorhabditis elegans. We first identified a set of motifs that are correlated with muscle-specific gene expression. We then predicted muscle-specific regulatory modules based on clusters of those motifs with characteristics similar to a collection of well-studied modules in other species. The method correctly identifies 88% of the experimentally characterized modules with a positive predictive value of at least 65%. The prediction accuracy of muscle-specific expression on an independent test set is highly significant (P<0.0001). We performed in vivo experimental tests of 12 predicted modules, and 10 of those drive muscle-specific gene expression. These results suggest that our method is highly accurate in identifying functional sequences important for muscle-specific gene expression and is a valuable tool for guiding experimental designs.
转录调控是在发育过程中控制基因时空表达的主要调控机制。这一过程由转录因子(TFs)执行,转录因子识别并结合其同源结合位点。最近的研究表明转录因子结合位点具有模块化组织,其中转录因子结合位点簇协同调控下游基因表达。在本研究中,我们报告了对秀丽隐杆线虫肌肉特异性顺式调控模块的计算识别和实验验证。我们首先鉴定了一组与肌肉特异性基因表达相关的基序。然后基于这些基序的簇预测肌肉特异性调控模块,这些基序的簇具有与其他物种中一组经过充分研究的模块相似的特征。该方法正确识别了88%的经实验表征的模块,阳性预测值至少为65%。在独立测试集上对肌肉特异性表达的预测准确性非常显著(P<0.0001)。我们对12个预测模块进行了体内实验测试,其中10个驱动肌肉特异性基因表达。这些结果表明,我们的方法在识别对肌肉特异性基因表达重要的功能序列方面高度准确,是指导实验设计的宝贵工具。