Tiberi Simone, Meili Joël, Cai Peiying, Soneson Charlotte, He Dongze, Sarkar Hirak, Avalos-Pacheco Alejandra, Patro Rob, Robinson Mark D
Department of Statistical Sciences, University of Bologna, Bologna, Italy.
Department of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, Switzerland.
bioRxiv. 2024 Jun 10:2023.08.17.553679. doi: 10.1101/2023.08.17.553679.
Although transcriptomics data is typically used to analyse mature spliced mRNA, recent attention has focused on jointly investigating spliced and unspliced (or precursor-) mRNA, which can be used to study gene regulation and changes in gene expression production. Nonetheless, most methods for spliced/unspliced inference (such as RNA velocity tools) focus on individual samples, and rarely allow comparisons between groups of samples (e.g., healthy . diseased). Furthermore, this kind of inference is challenging, because spliced and unspliced mRNA abundance is characterized by a high degree of quantification uncertainty, due to the prevalence of multi-mapping reads, i.e., reads compatible with multiple transcripts (or genes), and/or with both their spliced and unspliced versions.
Here, we present , a Bayesian hierarchical method to discover changes between experimental conditions with respect to the relative abundance of unspliced mRNA (over the total mRNA). We model the quantification uncertainty via a latent variable approach, where reads are allocated to their gene/transcript of origin, and to the respective splice version. We designed several benchmarks where our approach shows good performance, in terms of sensitivity and error control, versus state-of-the-art competitors. Importantly, our tool is flexible, and works with both bulk and single-cell RNA-sequencing data.
is distributed as a Bioconductor R package.
尽管转录组学数据通常用于分析成熟的剪接mRNA,但最近的研究重点已集中在联合研究剪接和未剪接(或前体)mRNA上,这可用于研究基因调控和基因表达产物的变化。尽管如此,大多数用于剪接/未剪接推断的方法(如RNA速度工具)都集中在单个样本上,很少允许对样本组(如健康组与疾病组)进行比较。此外,这种推断具有挑战性,因为由于多映射读数的普遍存在,即与多个转录本(或基因)兼容,和/或与它们的剪接和未剪接版本都兼容的读数,剪接和未剪接mRNA丰度具有高度的定量不确定性。
在此,我们提出了一种贝叶斯分层方法,用于发现实验条件之间未剪接mRNA(相对于总mRNA)相对丰度的变化。我们通过潜在变量方法对定量不确定性进行建模,其中读数被分配到它们的基因/转录本来源以及各自的剪接版本。我们设计了几个基准,在这些基准中,我们的方法在灵敏度和误差控制方面相对于最先进的竞争对手表现出良好的性能。重要的是,我们的工具具有灵活性,可用于批量和单细胞RNA测序数据。
该方法作为一个Bioconductor R包进行分发。