University of California Berkeley and University of California San Francisco Joint Graduate Group in Bioengineering, University of California, Berkeley, California, United States of America.
PLoS One. 2009 Sep 4;4(9):e6901. doi: 10.1371/journal.pone.0006901.
Recognizing regulatory sequences in genomes is a continuing challenge, despite a wealth of available genomic data and a growing number of experimentally validated examples.
METHODOLOGY/PRINCIPAL FINDINGS: We discuss here a simple approach to search for regulatory sequences based on the compositional similarity of genomic regions and known cis-regulatory sequences. This method, which is not limited to searching for predefined motifs, recovers sequences known to be under similar regulatory control. The words shared by the recovered sequences often correspond to known binding sites. Furthermore, we show that although local word profile clustering is predictive for the regulatory sequences involved in blastoderm segmentation, local dissimilarity is a more universal feature of known regulatory sequences in Drosophila.
CONCLUSIONS/SIGNIFICANCE: Our method leverages sequence motifs within a known regulatory sequence to identify co-regulated sequences without explicitly defining binding sites. We also show that regulatory sequences can be distinguished from surrounding sequences by local sequence dissimilarity, a novel feature in identifying regulatory sequences across a genome. Source code for WPH-finder is available for download at http://rana.lbl.gov/downloads/wph.tar.gz.
尽管有大量可用的基因组数据和越来越多经过实验验证的例子,识别基因组中的调控序列仍然是一个持续的挑战。
方法/主要发现:我们在这里讨论了一种简单的方法,基于基因组区域和已知顺式调控序列的组成相似性来搜索调控序列。这种方法不仅限于搜索预定义的基序,还可以恢复已知受相似调控控制的序列。恢复的序列中共享的词通常对应于已知的结合位点。此外,我们表明,虽然局部词谱聚类对涉及胚胎分割的调控序列具有预测性,但在果蝇中,局部不相似性是已知调控序列的更普遍特征。
结论/意义:我们的方法利用已知调控序列中的序列基序来识别共同调控的序列,而无需显式定义结合位点。我们还表明,通过局部序列不相似性可以区分调控序列和周围序列,这是在整个基因组中识别调控序列的一个新特征。WPH-finder 的源代码可在 http://rana.lbl.gov/downloads/wph.tar.gz 下载。