Department of Mathematics, Imperial College London, London SW7 2BX, United Kingdom.
I-X Centre for AI in Science, Imperial College London, White City Campus, London W12 0BZ, United Kingdom.
Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad395.
Gene expression is characterized by stochastic bursts of transcription that occur at brief and random periods of promoter activity. The kinetics of gene expression burstiness differs across the genome and is dependent on the promoter sequence, among other factors. Single-cell RNA sequencing (scRNA-seq) has made it possible to quantify the cell-to-cell variability in transcription at a global genome-wide level. However, scRNA-seq data are prone to technical variability, including low and variable capture efficiency of transcripts from individual cells.
Here, we propose a novel mathematical theory for the observed variability in scRNA-seq data. Our method captures burst kinetics and variability in both the cell size and capture efficiency, which allows us to propose several likelihood-based and simulation-based methods for the inference of burst kinetics from scRNA-seq data. Using both synthetic and real data, we show that the simulation-based methods provide an accurate, robust and flexible tool for inferring burst kinetics from scRNA-seq data. In particular, in a supervised manner, a simulation-based inference method based on neural networks proves to be accurate and useful when applied to both allele and nonallele-specific scRNA-seq data.
The code for Neural Network and Approximate Bayesian Computation inference is available at https://github.com/WT215/nnRNA and https://github.com/WT215/Julia_ABC, respectively.
基因表达的特点是转录的随机突发,这些突发发生在启动子活性的短暂和随机时期。基因表达突发的动力学在整个基因组中是不同的,并且取决于启动子序列等因素。单细胞 RNA 测序 (scRNA-seq) 使得在全基因组范围内量化转录的细胞间变异性成为可能。然而,scRNA-seq 数据容易受到技术变异性的影响,包括来自单个细胞的转录本的低捕获效率和可变捕获效率。
在这里,我们提出了一种用于解释 scRNA-seq 数据中观察到的变异性的新数学理论。我们的方法捕获了细胞大小和捕获效率中的突发动力学和变异性,这使我们能够提出几种基于似然和基于模拟的方法,用于从 scRNA-seq 数据推断突发动力学。使用合成数据和真实数据,我们表明基于模拟的方法为从 scRNA-seq 数据推断突发动力学提供了一种准确、稳健和灵活的工具。特别是,在监督方式下,基于神经网络的基于模拟的推断方法在应用于等位基因和非等位基因特异性 scRNA-seq 数据时被证明是准确和有用的。
基于神经网络和近似贝叶斯计算的推断的代码分别可在 https://github.com/WT215/nnRNA 和 https://github.com/WT215/Julia_ABC 上获得。