Suzuki Y, Tsunoda T, Sese J, Taira H, Mizushima-Sugano J, Hata H, Ota T, Isogai T, Tanaka T, Nakamura Y, Suyama A, Sakaki Y, Morishita S, Okubo K, Sugano S
Department of Virology, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo 108-8639, Japan.
Genome Res. 2001 May;11(5):677-84. doi: 10.1101/gr.gr-1640r.
To understand the mechanism of transcriptional regulation, it is essential to identify and characterize the promoter, which is located proximal to the mRNA start site. To identify the promoters from the large volumes of genomic sequences, we used mRNA start sites determined by a large-scale sequencing of the cDNA libraries constructed by the "oligo-capping" method. We aligned the mRNA start sites with the genomic sequences and retrieved adjacent sequences as potential promoter regions (PPRs) for 1031 genes. The PPR sequences were searched to determine the frequencies of major promoter elements. Among 1031 PPRs, 329 (32%) contained TATA boxes, 872 (85%) contained initiators, 999 (97%) contained GC box, and 663 (64%) contained CAAT box. Furthermore, 493 (48%) PPRs were located in CpG islands. This frequency of CpG islands was reduced in TATA(+)/Inr(+) PPRs and in the PPRs of ubiquitously expressed genes. In the PPRs of the CGM2 gene, the DRA gene, and the TM30pl genes, which showed highly colon specific expression patterns, the consensus sequences of E boxes were commonly observed. The PPRs were also useful for exploring promoter SNPs.
为了理解转录调控机制,识别和表征启动子至关重要,启动子位于mRNA起始位点附近。为了从大量基因组序列中识别启动子,我们使用了通过“寡聚帽”法构建的cDNA文库大规模测序确定的mRNA起始位点。我们将mRNA起始位点与基因组序列进行比对,并检索相邻序列作为1031个基因的潜在启动子区域(PPR)。搜索PPR序列以确定主要启动子元件的频率。在1031个PPR中,329个(32%)含有TATA盒,872个(85%)含有起始子,999个(97%)含有GC盒,663个(64%)含有CAAT盒。此外,493个(48%)PPR位于CpG岛中。在TATA(+)/Inr(+)PPR和普遍表达基因的PPR中,CpG岛的这种频率降低。在显示高度结肠特异性表达模式的CGM2基因、DRA基因和TM30pl基因的PPR中,通常观察到E盒的共有序列。PPR对于探索启动子单核苷酸多态性也很有用。