Bajic Vladimir B, Tan Sin Lam, Christoffels Alan, Schönbach Christian, Lipovich Leonard, Yang Liang, Hofmann Oliver, Kruger Adele, Hide Winston, Kai Chikatoshi, Kawai Jun, Hume David A, Carninci Piero, Hayashizaki Yoshihide
Knowledge Extraction Laboratory, Institute for Infocomm Research, Singapore.
PLoS Genet. 2006 Apr;2(4):e54. doi: 10.1371/journal.pgen.0020054. Epub 2006 Apr 28.
Using the two largest collections of Mus musculus and Homo sapiens transcription start sites (TSSs) determined based on CAGE tags, ditags, full-length cDNAs, and other transcript data, we describe the compositional landscape surrounding TSSs with the aim of gaining better insight into the properties of mammalian promoters. We classified TSSs into four types based on compositional properties of regions immediately surrounding them. These properties highlighted distinctive features in the extended core promoters that helped us delineate boundaries of the transcription initiation domain space for both species. The TSS types were analyzed for associations with initiating dinucleotides, CpG islands, TATA boxes, and an extensive collection of statistically significant cis-elements in mouse and human. We found that different TSS types show preferences for different sets of initiating dinucleotides and cis-elements. Through Gene Ontology and eVOC categories and tissue expression libraries we linked TSS characteristics to expression. Moreover, we show a link of TSS characteristics to very specific genomic organization in an example of immune-response-related genes (GO:0006955). Our results shed light on the global properties of the two transcriptomes not revealed before and therefore provide the framework for better understanding of the transcriptional mechanisms in the two species, as well as a framework for development of new and more efficient promoter- and gene-finding tools.
利用基于CAGE标签、双标签、全长cDNA及其他转录本数据确定的小家鼠和智人的两个最大转录起始位点(TSS)集合,我们描述了TSS周围的组成景观,旨在更深入地了解哺乳动物启动子的特性。我们根据紧邻TSS区域的组成特性将TSS分为四种类型。这些特性突出了扩展核心启动子中的独特特征,有助于我们划定这两个物种转录起始域空间的边界。我们分析了TSS类型与起始二核苷酸、CpG岛、TATA盒以及小鼠和人类中大量具有统计学意义的顺式元件之间的关联。我们发现不同的TSS类型对不同的起始二核苷酸和顺式元件集合表现出偏好。通过基因本体论和eVOC类别以及组织表达文库,我们将TSS特征与表达联系起来。此外,在一个免疫反应相关基因(GO:0006955)的例子中,我们展示了TSS特征与非常特定的基因组组织之间的联系。我们的结果揭示了以前未揭示的两个转录组的全局特性,因此为更好地理解这两个物种的转录机制提供了框架,也为开发新的、更高效的启动子和基因发现工具提供了框架。