Forseti：一种用于预测和解释单细胞 RNA-seq 数据剪接状态的机制模型。

Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads.

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States.

Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States.

出版信息

Bioinformatics. 2024 Jun 28;40(Suppl 1):i297-i306. doi: 10.1093/bioinformatics/btae207.

DOI:10.1093/bioinformatics/btae207

PMID:38940130

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11256924/

Abstract

MOTIVATION

Short-read single-cell RNA-sequencing (scRNA-seq) has been used to study cellular heterogeneity, cellular fate, and transcriptional dynamics. Modeling splicing dynamics in scRNA-seq data is challenging, with inherent difficulty in even the seemingly straightforward task of elucidating the splicing status of the molecules from which sequenced fragments are drawn. This difficulty arises, in part, from the limited read length and positional biases, which substantially reduce the specificity of the sequenced fragments. As a result, the splicing status of many reads in scRNA-seq is ambiguous because of a lack of definitive evidence. We are therefore in need of methods that can recover the splicing status of ambiguous reads which, in turn, can lead to more accuracy and confidence in downstream analyses.

RESULTS

We develop Forseti, a predictive model to probabilistically assign a splicing status to scRNA-seq reads. Our model has two key components. First, we train a binding affinity model to assign a probability that a given transcriptomic site is used in fragment generation. Second, we fit a robust fragment length distribution model that generalizes well across datasets deriving from different species and tissue types. Forseti combines these two trained models to predict the splicing status of the molecule of origin of reads by scoring putative fragments that associate each alignment of sequenced reads with proximate potential priming sites. Using both simulated and experimental data, we show that our model can precisely predict the splicing status of many reads and identify the true gene origin of multi-gene mapped reads.

AVAILABILITY AND IMPLEMENTATION

Forseti and the code used for producing the results are available at https://github.com/COMBINE-lab/forseti under a BSD 3-clause license.

摘要

动机

短读单细胞 RNA 测序 (scRNA-seq) 已被用于研究细胞异质性、细胞命运和转录动态。在 scRNA-seq 数据中建模剪接动态具有挑战性，即使在阐明从测序片段中提取的分子的剪接状态这一看似简单的任务中，也存在固有困难。这种困难部分源于读长有限和位置偏差，这大大降低了测序片段的特异性。因此，由于缺乏明确的证据，许多 scRNA-seq 读取的剪接状态是模糊的。因此，我们需要能够恢复模糊读取的剪接状态的方法，这反过来又可以提高下游分析的准确性和置信度。

结果

我们开发了 Forseti，这是一种预测模型，可以概率性地为 scRNA-seq 读取分配剪接状态。我们的模型有两个关键组成部分。首先，我们训练了一个结合亲和力模型，为给定的转录本位点在片段生成中被使用的概率分配一个概率。其次，我们拟合了一个稳健的片段长度分布模型，该模型可以很好地推广到来自不同物种和组织类型的数据集。Forseti 将这两个训练好的模型结合起来，通过对每个测序读取的对齐与邻近潜在启动子位点相关的假设片段进行评分，从而预测读取来源分子的剪接状态。使用模拟和实验数据，我们表明我们的模型可以精确预测许多读取的剪接状态，并识别多基因映射读取的真实基因起源。

可用性和实现

Forseti 和用于生成结果的代码可在 https://github.com/COMBINE-lab/forseti 下获得，许可证为 BSD 3 条款许可证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae32/11256924/0cb964d54695/btae207f1.jpg

相似文献

Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads.

Bioinformatics. 2024 Jun 28;40(Suppl 1):i297-i306. doi: 10.1093/bioinformatics/btae207.

Forseti: A mechanistic and predictive model of the splicing status of scRNA-seq reads.

bioRxiv. 2024 Feb 5:2024.02.01.577813. doi: 10.1101/2024.02.01.577813.

Identifying cancer cells from calling single-nucleotide variants in scRNA-seq data.

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae512.

Identification of cell barcodes from long-read single-cell RNA-seq with BLAZE.

Genome Biol. 2023 Apr 6;24(1):66. doi: 10.1186/s13059-023-02907-y.

Detecting differential alternative splicing events in scRNA-seq with or without Unique Molecular Identifiers.

PLoS Comput Biol. 2020 Jun 5;16(6):e1007925. doi: 10.1371/journal.pcbi.1007925. eCollection 2020 Jun.

scTPC: a novel semisupervised deep clustering model for scRNA-seq data.

Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae293.

scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab588.

Scalable preprocessing for sparse scRNA-seq data exploiting prior knowledge.

Bioinformatics. 2018 Jul 1;34(13):i124-i132. doi: 10.1093/bioinformatics/bty293.

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics.

PLoS Comput Biol. 2024 Jul 8;20(7):e1011620. doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.

Souporcell: robust clustering of single-cell RNA-seq data by genotype without reference genotypes.

Nat Methods. 2020 Jun;17(6):615-620. doi: 10.1038/s41592-020-0820-1. Epub 2020 May 4.

引用本文的文献

Improved characterization of 3' single-cell RNA-seq libraries with paired-end avidity sequencing.

NAR Genom Bioinform. 2024 Dec 18;6(4):lqae175. doi: 10.1093/nargab/lqae175. eCollection 2024 Dec.

Experimental and Computational Methods for Allelic Imbalance Analysis from Single-Nucleus RNA-seq Data.

bioRxiv. 2025 Jan 15:2024.08.13.607784. doi: 10.1101/2024.08.13.607784.

Improved characterization of single-cell RNA-seq libraries with paired-end avidity sequencing.

bioRxiv. 2024 Jul 12:2024.07.10.602909. doi: 10.1101/2024.07.10.602909.

本文引用的文献

Differences in molecular sampling and data processing explain variation among single-cell and single-nucleus RNA-seq experiments.

Genome Res. 2024 Mar 20;34(2):179-188. doi: 10.1101/gr.278253.123.

Assessing Markovian and Delay Models for Single-Nucleus RNA Sequencing.

Bull Math Biol. 2023 Oct 12;85(11):114. doi: 10.1007/s11538-023-01213-9.

simpleaf: a simple, flexible, and scalable framework for single-cell data processing using alevin-fry.

Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad614.

Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references.

Nat Methods. 2023 Oct;20(10):1506-1515. doi: 10.1038/s41592-023-02003-w. Epub 2023 Sep 11.

A relay velocity model infers cell-dependent RNA velocity.

Nat Biotechnol. 2024 Jan;42(1):99-108. doi: 10.1038/s41587-023-01728-5. Epub 2023 Apr 3.

Internal oligo(dT) priming introduces systematic bias in bulk and single-cell RNA sequencing count data.

NAR Genom Bioinform. 2022 May 25;4(2):lqac035. doi: 10.1093/nargab/lqac035. eCollection 2022 Jun.

Alevin-fry unlocks rapid, accurate and memory-frugal quantification of single-cell RNA-seq data.

Nat Methods. 2022 Mar;19(3):316-322. doi: 10.1038/s41592-022-01408-3. Epub 2022 Mar 11.

Modular, efficient and constant-memory single-cell RNA-seq preprocessing.

Nat Biotechnol. 2021 Jul;39(7):813-818. doi: 10.1038/s41587-021-00870-2. Epub 2021 Apr 1.

Generalizing RNA velocity to transient cell states through dynamical modeling.

Nat Biotechnol. 2020 Dec;38(12):1408-1414. doi: 10.1038/s41587-020-0591-3. Epub 2020 Aug 3.

RNA sequencing: the teenage years.

Nat Rev Genet. 2019 Nov;20(11):631-656. doi: 10.1038/s41576-019-0150-2. Epub 2019 Jul 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Forseti：一种用于预测和解释单细胞 RNA-seq 数据剪接状态的机制模型。

Forseti: a mechanistic and predictive model of the splicing status of scRNA-seq reads.

机构信息

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, United States.

Program in Computational Biology, Bioinformatics and Genomices, University of Maryland, College Park, MD 20742, United States.