Department of Computational Biology, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa 277-0882, Japan.
Genome Res. 2013 Aug;23(8):1348-61. doi: 10.1101/gr.151571.112. Epub 2013 May 1.
More than half of Caenorhabditis elegans pre-mRNAs lose their original 5' ends in a process termed "trans-splicing" in which the RNA extending from the transcription start site (TSS) to the site of trans-splicing of the primary transcript, termed the "outron," is replaced with a 22-nt spliced leader. This complicates the mapping of TSSs, leading to a lack of available TSS mapping data for these genes. We used growth at low temperature and nuclear isolation to enrich for transcripts still containing outrons, applying a modified SAGE capture procedure and high-throughput sequencing to characterize 5' termini in this transcript population. We report from this data both a landscape of 5'-end utilization for C. elegans and a representative collection of TSSs for 7351 trans-spliced genes. TSS distributions for individual genes were often dispersed, with a greater average number of TSSs for trans-spliced genes, suggesting that trans-splicing may remove selective pressure for a single TSS. Upstream of newly defined TSSs, we observed well-known motifs (including TATAA-box and SP1) as well as novel motifs. Several of these motifs showed association with tissue-specific expression and/or conservation among six worm species. Comparing TSS features between trans-spliced and non-trans-spliced genes, we found stronger signals among outron TSSs for preferentially positioning of flanking nucleosomes and for downstream Pol II enrichment. Our data provide an enabling resource for both experimental and theoretical analysis of gene structure and function in C. elegans.
超过一半的秀丽隐杆线虫前体 mRNA 在一个称为“反式剪接”的过程中失去了它们的原始 5' 端,在这个过程中,从转录起始位点(TSS)延伸到初级转录物反式剪接位点的 RNA,称为“外显子”,被替换为 22 个核苷酸的剪接先导。这使得 TSS 的定位变得复杂,导致这些基因缺乏可用的 TSS 定位数据。我们使用低温生长和核分离来富集仍然含有外显子的转录物,应用改良的 SAGE 捕获程序和高通量测序来描述这个转录物群体中的 5' 末端。我们从这些数据中报告了秀丽隐杆线虫的 5' 端利用景观和 7351 个反式剪接基因的代表性 TSS 集合。单个基因的 TSS 分布通常分散,反式剪接基因的 TSS 数量平均更多,这表明反式剪接可能消除了对单个 TSS 的选择压力。在新定义的 TSS 上游,我们观察到了众所周知的基序(包括 TATAA 框和 SP1)以及新的基序。这些基序中的几个与组织特异性表达和/或六种线虫物种之间的保守性有关。在比较反式剪接和非反式剪接基因的 TSS 特征时,我们发现外显子 TSS 中侧翼核小体优先定位和下游 Pol II 富集的信号更强。我们的数据为秀丽隐杆线虫基因结构和功能的实验和理论分析提供了一个有效的资源。