Department of Mathematics, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia.
Gene Center, Ludwig-Maximilians-Universität München, Munich 81377, Germany.
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad419.
Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods.
Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila.
fortuna source code is available at https://github.com/canzarlab/fortuna.
内含子的选择性剪接(AS)可从 pre-mRNA 产生不同的转录本集,这些转录本在不同的细胞类型和组织中都存在,但在许多疾病中也存在失调。无比对的计算方法极大地加速了对短 RNA-seq reads 中转录本的定量,但它们本质上依赖于已知转录本的目录,可能会错过新的、与疾病相关的剪接事件。相比之下,reads 与基因组的比对可以有效地识别新的外显子片段和内含子。基于事件的方法然后计算有多少reads 与预定义的特征对齐。然而,比对的计算成本更高,是许多 AS 分析方法的瓶颈。
在这里,我们提出了 fortunna 方法,该方法猜测注释剪接位点的新组合来创建转录片段。然后,它使用 kallisto 对片段进行伪比对,并从 kallisto 的等价类中有效地得出最基本的剪接单元的计数。这些计数可直接用于 AS 分析,或概括为更大的单元,如其他广泛应用的方法所使用的。在对合成数据和真实数据的实验中,fortuna 比传统的比对和计数方法快约 7 倍,当使用 4 个线程时,它能够在短短 15 分钟内分析近 3 亿个reads。它在 novel junctions 上更准确地比对包含错配的reads,并在自闭症谱系障碍患者中发现了更多支持异常剪接事件的reads,比现有方法更多。我们还使用 fortunna 鉴定了果蝇中的新的、组织特异性的剪接事件。
fortuna 的源代码可在 https://github.com/canzarlab/fortuna 上获得。