Frith Martin C, Spouge John L, Hansen Ulla, Weng Zhiping
Bioinformatics Program, Boston University, 44 Cummington Street, Boston MA 02215, USA.
Nucleic Acids Res. 2002 Jul 15;30(14):3214-24. doi: 10.1093/nar/gkf438.
The human genome encodes the transcriptional control of its genes in clusters of cis-elements that constitute enhancers, silencers and promoter signals. The sequence motifs of individual cis- elements are usually too short and degenerate for confident detection. In most cases, the requirements for organization of cis-elements within these clusters are poorly understood. Therefore, we have developed a general method to detect local concentrations of cis-element motifs, using predetermined matrix representations of the cis-elements, and calculate the statistical significance of these motif clusters. The statistical significance calculation is highly accurate not only for idealized, pseudorandom DNA, but also for real human DNA. We use our method 'cluster of motifs E-value tool' (COMET) to make novel predictions concerning the regulation of genes by transcription factors associated with muscle. COMET performs comparably with two alternative state-of-the-art techniques, which are more complex and lack E-value calculations. Our statistical method enables us to clarify the major bottleneck in the hard problem of detecting cis-regulatory regions, which is that many known enhancers do not contain very significant clusters of the motif types that we search for. Thus, discovery of additional signals that belong to these regulatory regions will be the key to future progress.
人类基因组通过构成增强子、沉默子和启动子信号的顺式元件簇对其基因进行转录控制。单个顺式元件的序列基序通常太短且具有简并性,难以可靠检测。在大多数情况下,对这些簇内顺式元件组织的要求了解甚少。因此,我们开发了一种通用方法,利用顺式元件的预定矩阵表示来检测顺式元件基序的局部浓度,并计算这些基序簇的统计显著性。这种统计显著性计算不仅对理想化的伪随机DNA高度准确,对真实的人类DNA也同样如此。我们使用我们的方法“基序簇E值工具”(COMET)对与肌肉相关的转录因子对基因的调控做出新的预测。COMET与另外两种更复杂且缺乏E值计算的先进技术表现相当。我们的统计方法使我们能够阐明检测顺式调控区域这一难题中的主要瓶颈,即许多已知的增强子并不包含我们所寻找的基序类型的非常显著的簇。因此,发现属于这些调控区域的其他信号将是未来进展的关键。