Chuang Trees-Juen, Wu Chan-Shuo, Chen Chia-Ying, Hung Li-Yuan, Chiang Tai-Wei, Yang Min-Yu
Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan
Division of Physical and Computational Genomics, Genomics Research Center, Academia Sinica, Taipei 11529, Taiwan.
Nucleic Acids Res. 2016 Feb 18;44(3):e29. doi: 10.1093/nar/gkv1013. Epub 2015 Oct 5.
Analysis of RNA-seq data often detects numerous 'non-co-linear' (NCL) transcripts, which comprised sequence segments that are topologically inconsistent with their corresponding DNA sequences in the reference genome. However, detection of NCL transcripts involves two major challenges: removal of false positives arising from alignment artifacts and discrimination between different types of NCL transcripts (trans-spliced, circular or fusion transcripts). Here, we developed a new NCL-transcript-detecting method ('NCLscan'), which utilized a stepwise alignment strategy to almost completely eliminate false calls (>98% precision) without sacrificing true positives, enabling NCLscan outperform 18 other publicly-available tools (including fusion- and circular-RNA-detecting tools) in terms of sensitivity and precision, regardless of the generation strategy of simulated dataset, type of intragenic or intergenic NCL event, read depth of coverage, read length or expression level of NCL transcript. With the high accuracy, NCLscan was applied to distinguishing between trans-spliced, circular and fusion transcripts on the basis of poly(A)- and nonpoly(A)-selected RNA-seq data. We showed that circular RNAs were expressed more ubiquitously, more abundantly and less cell type-specifically than trans-spliced and fusion transcripts. Our study thus describes a robust pipeline for the discovery of NCL transcripts, and sheds light on the fundamental biology of these non-canonical RNA events in human transcriptome.
RNA测序数据分析常常能检测到众多“非共线性”(NCL)转录本,这些转录本包含的序列片段在拓扑结构上与其在参考基因组中对应的DNA序列不一致。然而,检测NCL转录本面临两大主要挑战:去除比对假象产生的假阳性,以及区分不同类型的NCL转录本(反式剪接、环状或融合转录本)。在此,我们开发了一种新的NCL转录本检测方法(“NCLscan”),该方法采用逐步比对策略,在不牺牲真阳性的情况下几乎完全消除假阳性(精度>98%),使得NCLscan在灵敏度和精度方面优于其他18种公开可用的工具(包括融合RNA和环状RNA检测工具),无论模拟数据集的生成策略、基因内或基因间NCL事件的类型、覆盖的读长深度、读长或NCL转录本的表达水平如何。凭借高准确性,NCLscan基于聚腺苷酸(poly(A))选择和非聚腺苷酸选择的RNA测序数据,用于区分反式剪接、环状和融合转录本。我们发现,环状RNA比反式剪接和融合转录本表达得更普遍、更丰富,且细胞类型特异性更低。因此,我们的研究描述了一种用于发现NCL转录本的强大流程,并揭示了人类转录组中这些非经典RNA事件的基础生物学。