Université de Lyon, ENS de Lyon, Université Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology and Modelling of the Cell, 46 Allée d'Italie Site Jacques Monod, F-69007, Lyon, France.
IRISA Inria Rennes Bretagne Atlantique CNRS UMR 6074, Université Rennes 1, GenScale team, Rennes, 263 Avenue Général Leclerc, Rennes, France.
Sci Rep. 2018 Mar 9;8(1):4307. doi: 10.1038/s41598-018-21770-7.
Genome-wide analyses estimate that more than 90% of multi exonic human genes produce at least two transcripts through alternative splicing (AS). Various bioinformatics methods are available to analyze AS from RNAseq data. Most methods start by mapping the reads to an annotated reference genome, but some start by a de novo assembly of the reads. In this paper, we present a systematic comparison of a mapping-first approach (FARLINE) and an assembly-first approach (KISSPLICE). We applied these methods to two independent RNAseq datasets and found that the predictions of the two pipelines overlapped (70% of exon skipping events were common), but with noticeable differences. The assembly-first approach allowed to find more novel variants, including novel unannotated exons and splice sites. It also predicted AS in recently duplicated genes. The mapping-first approach allowed to find more lowly expressed splicing variants, and splice variants overlapping repeats. This work demonstrates that annotating AS with a single approach leads to missing out a large number of candidates, many of which are differentially regulated across conditions and can be validated experimentally. We therefore advocate for the combined use of both mapping-first and assembly-first approaches for the annotation and differential analysis of AS from RNAseq datasets.
全基因组分析估计,超过 90%的多外显子人类基因通过选择性剪接 (AS) 产生至少两种转录本。有各种生物信息学方法可用于从 RNAseq 数据中分析 AS。大多数方法首先将读取映射到注释的参考基因组,但有些方法首先从头组装读取。在本文中,我们对基于映射的方法(FARLINE)和基于组装的方法(KISSPLICE)进行了系统比较。我们将这些方法应用于两个独立的 RNAseq 数据集,发现两个管道的预测重叠(70%的外显子跳跃事件是共同的),但存在明显差异。基于组装的方法可以找到更多的新型变体,包括新的未注释外显子和剪接位点。它还预测了最近复制基因中的 AS。基于映射的方法可以找到更多低表达的剪接变体,以及与重复重叠的剪接变体。这项工作表明,仅使用一种方法进行 AS 注释会导致大量候选物丢失,其中许多在不同条件下是差异调控的,可以通过实验验证。因此,我们主张结合使用基于映射和基于组装的方法,对 RNAseq 数据集的 AS 进行注释和差异分析。