Murakami Katsuhiko, Imanishi Tadashi, Gojobori Takashi, Nakai Kenta
Integrated Database Group, Japan Biological Information Research Center (JBIRC), Japan Biological Informatics Consortium, Aomi 2-41, Koto-ku, Tokyo, 135-0064, Japan.
BMC Genomics. 2008 Mar 1;9:112. doi: 10.1186/1471-2164-9-112.
It is essential in modern biology to understand how transcriptional regulatory regions are composed of cis-elements, yet we have limited knowledge of, for example, the combinational uses of these elements and their positional distribution.
We predicted the positions of 228 known binding motifs for transcription factors in phylogenetically conserved regions within -2000 and +1000 bp of transcriptional start sites (TSSs) of human genes and visualized their correlated non-overlapping occurrences. In the 8,454 significantly correlated motif pairs, two major classes were observed: 248 pairs in Class 1 were mainly found around TSSs, whereas 4,020 Class 2 pairs appear at rather arbitrary distances from TSSs. These classes are distinct in a number of aspects. First, the positional distribution of the Class 1 constituent motifs shows a single peak near the TSSs, whereas Class 2 motifs show a relatively broad distribution. Second, genes that harbor the Class 1 pairs are more likely to be CpG-rich and to be expressed ubiquitously than those that harbor Class 2 pairs. Third, the 'hub' motifs, which are used in many different motif pairs, are different between the two classes. In addition, many of the transcription factors that correspond to the Class 2 hub motifs contain domains rich in specific amino acids; these domains may form disordered regions important for protein-protein interaction.
There exist at least two classes of motif pairs with respect to TSSs in human promoters, possibly reflecting compositional differences between promoters and enhancers. We anticipate that our visualization method may be useful for the further characterisation of promoters.
了解转录调控区域如何由顺式元件组成在现代生物学中至关重要,但我们对这些元件的组合使用及其位置分布等方面的了解有限。
我们预测了人类基因转录起始位点(TSS)上下游-2000至+1000bp系统发育保守区域内228个已知转录因子结合基序的位置,并可视化了它们相关的非重叠出现情况。在8454个显著相关的基序对中,观察到两个主要类别:第1类中的248对主要出现在TSS周围,而第2类的4020对出现在距离TSS相当任意的位置。这些类别在多个方面有所不同。首先,第1类组成基序的位置分布在TSS附近显示出一个单峰,而第2类基序显示出相对较宽的分布。其次,含有第1类对的基因比含有第2类对的基因更可能富含CpG且普遍表达。第三,在许多不同基序对中使用的“枢纽”基序在两类之间有所不同。此外,许多与第2类枢纽基序对应的转录因子含有富含特定氨基酸的结构域;这些结构域可能形成对蛋白质-蛋白质相互作用很重要的无序区域。
人类启动子中关于TSS至少存在两类基序对,可能反映了启动子和增强子之间的组成差异。我们预计我们的可视化方法可能有助于进一步表征启动子。