Kimura Kouichi, Wakamatsu Ai, Suzuki Yutaka, Ota Toshio, Nishikawa Tetsuo, Yamashita Riu, Yamamoto Jun-ichi, Sekine Mitsuo, Tsuritani Katsuki, Wakaguri Hiroyuki, Ishii Shizuko, Sugiyama Tomoyasu, Saito Kaoru, Isono Yuko, Irie Ryotaro, Kushida Norihiro, Yoneyama Takahiro, Otsuka Rie, Kanda Katsuhiro, Yokoi Takahide, Kondo Hiroshi, Wagatsuma Masako, Murakawa Katsuji, Ishida Shinichi, Ishibashi Tadashi, Takahashi-Fujii Asako, Tanase Tomoo, Nagai Keiichi, Kikuchi Hisashi, Nakai Kenta, Isogai Takao, Sugano Sumio
Life Science Research Laboratory, Central Research Laboratory, Hitachi, Ltd., Kokubunji, Tokyo, 185-8601, Japan.
Genome Res. 2006 Jan;16(1):55-65. doi: 10.1101/gr.4039406. Epub 2005 Dec 12.
By analyzing 1,780,295 5'-end sequences of human full-length cDNAs derived from 164 kinds of oligo-cap cDNA libraries, we identified 269,774 independent positions of transcriptional start sites (TSSs) for 14,628 human RefSeq genes. These TSSs were clustered into 30,964 clusters that were separated from each other by more than 500 bp and thus are very likely to constitute mutually distinct alternative promoters. To our surprise, at least 7674 (52%) human RefSeq genes were subject to regulation by putative alternative promoters (PAPs). On average, there were 3.1 PAPs per gene, with the composition of one CpG-island-containing promoter per 2.6 CpG-less promoters. In 17% of the PAP-containing loci, tissue-specific use of the PAPs was observed. The richest tissue sources of the tissue-specific PAPs were testis and brain. It was also intriguing that the PAP-containing promoters were enriched in the genes encoding signal transduction-related proteins and were rarer in the genes encoding extracellular proteins, possibly reflecting the varied functional requirement for and the restricted expression of those categories of genes, respectively. The patterns of the first exons were highly diverse as well. On average, there were 7.7 different splicing types of first exons per locus partly produced by the PAPs, suggesting that a wide variety of transcripts can be achieved by this mechanism. Our findings suggest that use of alternate promoters and consequent alternative use of first exons should play a pivotal role in generating the complexity required for the highly elaborated molecular systems in humans.
通过分析源自164种寡聚帽cDNA文库的1,780,295条人类全长cDNA的5'端序列,我们确定了14,628个人类RefSeq基因的转录起始位点(TSS)的269,774个独立位置。这些TSS被聚类成30,964个簇,彼此之间相隔超过500 bp,因此很可能构成相互不同的替代启动子。令我们惊讶的是,至少7674个(52%)人类RefSeq基因受到推定替代启动子(PAP)的调控。每个基因平均有3.1个PAP,每2.6个不含CpG岛的启动子中有一个含CpG岛的启动子。在17%的含PAP位点中,观察到PAP的组织特异性使用情况。组织特异性PAP最丰富的组织来源是睾丸和大脑。同样有趣的是,含PAP的启动子在编码信号转导相关蛋白的基因中富集,而在编码细胞外蛋白的基因中较少,这可能分别反映了这些基因类别对功能的不同需求和表达的受限情况。第一个外显子的模式也高度多样。每个位点平均有7.7种不同的由PAP部分产生的第一个外显子的剪接类型,这表明通过这种机制可以产生各种各样的转录本。我们的研究结果表明,替代启动子的使用以及随之而来的第一个外显子的替代使用,应该在产生人类高度复杂的分子系统所需的复杂性方面发挥关键作用。