Max Delbrück Center for Molecular Medicine, 13125, Berlin, Germany.
German Cancer Research Center, 69120, Heidelberg, Germany.
Nat Commun. 2019 Nov 1;10(1):5009. doi: 10.1038/s41467-019-13037-0.
Gene annotation is a critical resource in genomics research. Many computational approaches have been developed to assemble transcriptomes based on high-throughput short-read sequencing, however, only with limited accuracy. Here, we combine next-generation and third-generation sequencing to reconstruct a full-length transcriptome in the rat hippocampus, which is further validated using independent 5´ and 3´-end profiling approaches. In total, we detect 28,268 full-length transcripts (FLTs), covering 6,380 RefSeq genes and 849 unannotated loci. Based on these FLTs, we discover co-occurring alternative RNA processing events. Integrating with polysome profiling and ribosome footprinting data, we predict isoform-specific translational status and reconstruct an open reading frame (ORF)-eome. Notably, a high proportion of the predicted ORFs are validated by mass spectrometry-based proteomics. Moreover, we identify isoforms with subcellular localization pattern in neurons. Collectively, our data advance our knowledge of RNA and protein isoform diversity in the rat brain and provide a rich resource for functional studies.
基因注释是基因组学研究中的关键资源。已经开发了许多基于高通量短读测序的计算方法来组装转录组,然而,其准确性有限。在这里,我们结合了第二代和第三代测序技术,在大鼠海马体中重建了全长转录组,并使用独立的 5' 和 3' 端谱分析方法进行了进一步验证。总共,我们检测到 28268 个全长转录本 (FLTs),覆盖了 6380 个 RefSeq 基因和 849 个未注释的基因座。基于这些 FLTs,我们发现了同时发生的替代 RNA 处理事件。通过与多核糖体谱和核糖体足迹数据整合,我们预测了异构体特异性的翻译状态,并重建了一个开放阅读框 (ORF)-ome。值得注意的是,预测的 ORFs 中有很大一部分被基于质谱的蛋白质组学所验证。此外,我们还鉴定了神经元中具有亚细胞定位模式的异构体。总的来说,我们的数据推进了我们对大鼠大脑中 RNA 和蛋白质异构体多样性的认识,并为功能研究提供了丰富的资源。