Department of Computer Science, University of Maryland, College Park, Maryland, USA.
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.
BMC Bioinformatics. 2019 Aug 13;20(1):421. doi: 10.1186/s12859-019-2947-6.
Ultra-fast pseudo-alignment approaches are the tool of choice in transcript-level RNA sequencing (RNA-seq) analyses. Unfortunately, these methods couple the tasks of pseudo-alignment and transcript quantification. This coupling precludes the direct usage of pseudo-alignment to other expression analyses, including alternative splicing or differential gene expression analysis, without including a non-essential transcript quantification step.
In this paper, we introduce a transcriptome segmentation approach to decouple these two tasks. We propose an efficient algorithm to generate maximal disjoint segments given a transcriptome reference library on which ultra-fast pseudo-alignment can be used to produce per-sample segment counts. We show how to apply these maximally unambiguous count statistics in two specific expression analyses - alternative splicing and gene differential expression - without the need of a transcript quantification step. Our experiments based on simulated and experimental data showed that the use of segment counts, like other methods that rely on local coverage statistics, provides an advantage over approaches that rely on transcript quantification in detecting and correctly estimating local splicing in the case of incomplete transcript annotations.
The transcriptome segmentation approach implemented in Yanagi exploits the computational and space efficiency of pseudo-alignment approaches. It significantly expands their applicability and interpretability in a variety of RNA-seq analyses by providing the means to model and capture local coverage variation in these analyses.
超快速伪比对方法是转录水平 RNA 测序 (RNA-seq) 分析的首选工具。不幸的是,这些方法将伪比对和转录定量任务结合在一起。这种耦合使得直接将伪比对用于其他表达分析(包括可变剪接或差异基因表达分析)成为不可能,除非包括一个非必要的转录定量步骤。
在本文中,我们介绍了一种转录组分割方法来分离这两个任务。我们提出了一种有效的算法,给定一个转录组参考文库,可以生成最大不相交的片段,然后可以在该参考文库上使用超快速伪比对来生成每个样本的片段计数。我们展示了如何在两种特定的表达分析(可变剪接和基因差异表达)中应用这些最大无歧义的计数统计信息,而无需进行转录定量步骤。我们基于模拟和实验数据的实验表明,与依赖于局部覆盖统计的其他方法一样,使用片段计数在检测和正确估计局部剪接方面优于依赖于转录定量的方法,特别是在转录本注释不完全的情况下。
Yanagi 中实现的转录组分割方法利用了伪比对方法的计算和空间效率。它通过为这些分析中的局部覆盖变化建模和捕获提供了手段,极大地扩展了它们在各种 RNA-seq 分析中的适用性和可解释性。