Dieterich Christoph, Grossmann Steffen, Tanzer Andrea, Röpcke Stefan, Arndt Peter F, Stadler Peter F, Vingron Martin
Computational Molecular Biology Department, Max Planck Institute for Molecular Genetics, Ihnestrasse 73, 14195 Berlin, Germany.
BMC Genomics. 2005 Feb 21;6:24. doi: 10.1186/1471-2164-6-24.
Promoters are key players in gene regulation. They receive signals from various sources (e.g. cell surface receptors) and control the level of transcription initiation, which largely determines gene expression. In vertebrates, transcription start sites and surrounding regulatory elements are often poorly defined. To support promoter analysis, we present CORG http://corg.molgen.mpg.de, a framework for studying upstream regions including untranslated exons (5' UTR).
The automated annotation of promoter regions integrates information of two kinds. First, statistically significant cross-species conservation within upstream regions of orthologous genes is detected. Pairwise as well as multiple sequence comparisons are computed. Second, binding site descriptions (position-weight matrices) are employed to predict conserved regulatory elements with a novel approach. Assembled EST sequences and verified transcription start sites are incorporated to distinguish exonic from other sequences. As of now, we have included 5 species in our analysis pipeline (man, mouse, rat, fugu and zebrafish). We characterized promoter regions of 16,127 groups of orthologous genes. All data are presented in an intuitive way via our web site. Users are free to export data for single genes or access larger data sets via our DAS server http://tomcat.molgen.mpg.de:8080/das. The benefits of our framework are exemplarily shown in the context of phylogenetic profiling of transcription factor binding sites and detection of microRNAs close to transcription start sites of our gene set.
The CORG platform is a versatile tool to support analyses of gene regulation in vertebrate promoter regions. Applications for CORG cover a broad range from studying evolution of DNA binding sites and promoter constitution to the discovery of new regulatory sequence elements (e.g. microRNAs and binding sites).
启动子是基因调控的关键因素。它们接收来自各种来源(如细胞表面受体)的信号,并控制转录起始水平,而转录起始水平在很大程度上决定了基因表达。在脊椎动物中,转录起始位点和周围的调控元件通常定义不明确。为了支持启动子分析,我们推出了CORG(http://corg.molgen.mpg.de),这是一个用于研究包括非翻译外显子(5'UTR)在内的上游区域的框架。
启动子区域的自动注释整合了两种信息。首先,检测直系同源基因上游区域内具有统计学意义的跨物种保守性。计算成对以及多序列比较。其次,采用结合位点描述(位置权重矩阵)以一种新颖的方法预测保守的调控元件。整合组装的EST序列和经过验证的转录起始位点,以区分外显子序列和其他序列。截至目前,我们在分析流程中纳入了5个物种(人类、小鼠、大鼠、河豚和斑马鱼)。我们对16127组直系同源基因的启动子区域进行了特征描述。所有数据都通过我们的网站以直观的方式呈现。用户可以自由导出单个基因的数据,或通过我们的DAS服务器(http://tomcat.molgen.mpg.de:8080/das)访问更大的数据集。我们框架的优势在转录因子结合位点的系统发育分析以及我们基因集转录起始位点附近的微小RNA检测的背景下得到了示例性展示。
CORG平台是支持脊椎动物启动子区域基因调控分析的通用工具。CORG的应用涵盖了广泛的范围,从研究DNA结合位点和启动子组成的进化到发现新的调控序列元件(如微小RNA和结合位点)。