计算新剪接事件的伪比对。

Counting pseudoalignments to novel splicing events.

机构信息

Department of Mathematics, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia.

Gene Center, Ludwig-Maximilians-Universität München, Munich 81377, Germany.

出版信息

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad419.

DOI:10.1093/bioinformatics/btad419

PMID:37432342

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10348833/

Abstract

MOTIVATION

Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods.

RESULTS

Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto's equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila.

AVAILABILITY AND IMPLEMENTATION

fortuna source code is available at https://github.com/canzarlab/fortuna.

摘要

动机

内含子的选择性剪接（AS）可从 pre-mRNA 产生不同的转录本集，这些转录本在不同的细胞类型和组织中都存在，但在许多疾病中也存在失调。无比对的计算方法极大地加速了对短 RNA-seq reads 中转录本的定量，但它们本质上依赖于已知转录本的目录，可能会错过新的、与疾病相关的剪接事件。相比之下，reads 与基因组的比对可以有效地识别新的外显子片段和内含子。基于事件的方法然后计算有多少reads 与预定义的特征对齐。然而，比对的计算成本更高，是许多 AS 分析方法的瓶颈。

结果

在这里，我们提出了 fortunna 方法，该方法猜测注释剪接位点的新组合来创建转录片段。然后，它使用 kallisto 对片段进行伪比对，并从 kallisto 的等价类中有效地得出最基本的剪接单元的计数。这些计数可直接用于 AS 分析，或概括为更大的单元，如其他广泛应用的方法所使用的。在对合成数据和真实数据的实验中，fortuna 比传统的比对和计数方法快约 7 倍，当使用 4 个线程时，它能够在短短 15 分钟内分析近 3 亿个reads。它在 novel junctions 上更准确地比对包含错配的reads，并在自闭症谱系障碍患者中发现了更多支持异常剪接事件的reads，比现有方法更多。我们还使用 fortunna 鉴定了果蝇中的新的、组织特异性的剪接事件。

可用性和实现

fortuna 的源代码可在 https://github.com/canzarlab/fortuna 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/289a/10348833/b5dbdf27b007/btad419f1.jpg

相似文献

Counting pseudoalignments to novel splicing events.

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad419.

Intron-centric estimation of alternative splicing from RNA-seq data.

Bioinformatics. 2013 Jan 15;29(2):273-4. doi: 10.1093/bioinformatics/bts678. Epub 2012 Nov 21.

SpliceJumper: a classification-based approach for calling splicing junctions from RNA-seq data.

BMC Bioinformatics. 2015;16 Suppl 17(Suppl 17):S10. doi: 10.1186/1471-2105-16-S17-S10. Epub 2015 Dec 7.

ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events.

BMC Bioinformatics. 2018 Nov 20;19(1):444. doi: 10.1186/s12859-018-2436-3.

Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i297-i306. doi: 10.1093/bioinformatics/btae207.

Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.

DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields.

Bioinformatics. 2017 Jul 15;33(14):i267-i273. doi: 10.1093/bioinformatics/btx267.

TopHat: discovering splice junctions with RNA-Seq.

Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.

Discover hidden splicing variations by mapping personal transcriptomes to personal genomes.

Nucleic Acids Res. 2015 Dec 15;43(22):10612-22. doi: 10.1093/nar/gkv1099. Epub 2015 Nov 17.

Detection of splicing events and multiread locations from RNA-seq data based on a geometric-tail (GT) distribution of intron length.

BMC Bioinformatics. 2011;12 Suppl 5(Suppl 5):S2. doi: 10.1186/1471-2105-12-S5-S2. Epub 2011 Jul 27.

引用本文的文献

SyFi: generating and using sequence fingerprints to distinguish SynCom isolates.

Microb Genom. 2025 Sep;11(9). doi: 10.1099/mgen.0.001461.

本文引用的文献

Exploring the Diverse Functional and Regulatory Consequences of Alternative Splicing in Development and Disease.

Front Genet. 2021 Nov 24;12:775395. doi: 10.3389/fgene.2021.775395. eCollection 2021.

recount3: summaries and queries for large-scale RNA-seq expression and splicing.

Genome Biol. 2021 Nov 29;22(1):323. doi: 10.1186/s13059-021-02533-6.

A pan-cancer transcriptome analysis of exitron splicing identifies novel cancer driver genes and neoepitopes.

Mol Cell. 2021 May 20;81(10):2246-2260.e12. doi: 10.1016/j.molcel.2021.03.028. Epub 2021 Apr 15.

McSplicer: a probabilistic model for estimating splice site usage from RNA-seq data.

Bioinformatics. 2021 Aug 4;37(14):2004–2011. doi: 10.1093/bioinformatics/btab050. Epub 2021 Jan 30.

Yanagi: Fast and interpretable segment-based alternative splicing and gene expression analysis.

BMC Bioinformatics. 2019 Aug 13;20(1):421. doi: 10.1186/s12859-019-2947-6.

Using equivalence class counts for fast and accurate testing of differential transcript usage.

F1000Res. 2019 Mar 7;8:265. doi: 10.12688/f1000research.18276.2. eCollection 2019.

Predicting Splicing from Primary Sequence with Deep Learning.

Cell. 2019 Jan 24;176(3):535-548.e24. doi: 10.1016/j.cell.2018.12.015. Epub 2019 Jan 17.

ASGAL: aligning RNA-Seq data to a splicing graph to detect novel alternative splicing events.

BMC Bioinformatics. 2018 Nov 20;19(1):444. doi: 10.1186/s12859-018-2436-3.

Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop.

Mol Cell. 2018 Oct 4;72(1):187-200.e6. doi: 10.1016/j.molcel.2018.08.018. Epub 2018 Sep 13.

Comprehensive Analysis of Alternative Splicing Across Tumors from 8,705 Patients.

Cancer Cell. 2018 Aug 13;34(2):211-224.e6. doi: 10.1016/j.ccell.2018.07.001. Epub 2018 Aug 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

计算新剪接事件的伪比对。

Counting pseudoalignments to novel splicing events.

机构信息

Department of Mathematics, Josip Juraj Strossmayer University of Osijek, Osijek 31000, Croatia.

Gene Center, Ludwig-Maximilians-Universität München, Munich 81377, Germany.