Systems Biology Doctoral Training Centre, University of Warwick, Coventry CV47AL, UK.
Plant J. 2010 Oct;64(1):165-76. doi: 10.1111/j.1365-313X.2010.04314.x. Epub 2010 Sep 16.
Identification of regulatory sequences within non-coding regions of DNA is an essential step towards elucidation of gene networks. This approach constitutes a major challenge, however, as only a very small fraction of non-coding DNA is thought to contribute to gene regulation. The mapping of regulatory regions traditionally involves the laborious construction of promoter deletion series which are then fused to reporter genes and assayed in transgenic organisms. Bioinformatic methods can be used to scan sequences for matches for known regulatory motifs, however these methods are currently hampered by the relatively small amount of such motifs and by a high false-discovery rate. Here, we demonstrate a robust and highly sensitive, in silico method to identify evolutionarily conserved regions within non-coding DNA. Sequence conservation within these regions is taken as evidence for evolutionary pressure against mutations, which is suggestive of functional importance. We test this method on a small set of well characterised promoters, and show that it successfully identifies known regulatory regions. We further show that these evolutionarily conserved sequences contain clusters of transcription binding sites, often described as regulatory modules. A version of the tool optimised for the analysis of plant promoters is available online at http://wsbc.warwick.ac.uk/ears/main.php.
鉴定 DNA 非编码区的调控序列是阐明基因网络的关键步骤。然而,这一方法极具挑战性,因为人们认为只有很小一部分非编码 DNA 与基因调控有关。传统上,调控区域的绘制涉及到费力构建启动子缺失系列,然后将其与报告基因融合,并在转基因生物中进行检测。生物信息学方法可用于搜索序列中与已知调控基序匹配的序列,然而,这些方法目前受到这类基序相对较少以及高假阳性率的限制。在这里,我们展示了一种强大而高度敏感的计算方法,用于识别非编码 DNA 中的进化保守区域。这些区域内的序列保守性被视为对突变的进化压力的证据,这表明它们具有功能重要性。我们在一小组特征明确的启动子上测试了这种方法,并表明它成功地识别了已知的调控区域。我们进一步表明,这些进化保守序列包含转录结合位点簇,通常被描述为调控模块。一个针对植物启动子分析优化的工具版本可在 http://wsbc.warwick.ac.uk/ears/main.php 上获得。